IMCC 0.0.9.5

imcc is the Intermediate Code Compiler for Parrot.
The language it compiles is currently termed Parrot
Intermediate Language (PIR).

Why? Writing a compiler is a large undertaking. We are trying
to take some of the load off of potential language designers,
including the Perl6 compiler itself. We can provide a
common back-end for Parrot that does:

   Register Allocation and Spillage
   Constant folding and expression evaluation
   Instruction selection.
   Optimization.

This way, language designers can get right to work on

   Tokenizing, parsing, type checking AST/DAG production

Then they can simply feed PIR to imcc which will compile
directly to Parrot bytecode, potentially skipping the
assembler altogether.

So far, all the compiler does (besides translating the PIR to pasm) is
register allocation and spilling. I like Steve Muchnick's MIR language,
and I'm taking a few things from it.

Presently you can write code with unlimited symbolics or named
locals and imcc will translate to pasm.

I expect the IR compiler to focus on staying FAST, simple, and
maintainable, and never develop featuritis, however I want it to be
adequate for all languages targetting parrot. Did I mention that
it needs to be FAST?

We have other options like having imcc to become an assembler in its
own right, which is fine by me, but for now, I think Parrot is changing way
too fast to have another assembler branch.


Register Allocation

  The allocator uses graph-coloring and du-chains to assign registers
  to lexicals and symbolic temporaries. One weakness of the allocator
  is the lack of branch analysis. A brute force method is used in the
  du-chain computation where we assume any symbol is live from the time
  it was first used until either the last time it was used or the last
  branch instruction. This is being replaced with directed graphs of
  basic blocks and flow analysis.

Spilling

  Currently the spiller works with the graph-coloring routines, assigning
  spill weights based on how many times a symbol has been referenced.
  This will be improved later as we start doing more code analysis.
  Since Parrot lacks random access stacks, registers are spilled to
  a fixed PMC array, currently hardcoded as P31. If no spillage is
  required for the block, no array is allocated.

Optimization

  We break the instructions into a directed graph of basic blocks.
  The plan is to translate to SSA form to make optimizations easier.

Why C and Bison?

  Until Perl6 compiles itself and is fast, a Bison parser is
the easiest to maintain. An additional, important benefit, is
C-based parsers are pretty darn fast. Currently assembling
Parrot on the fly is still relatively slow.


More documentation

  s. the docs subdir


Language Reference

Variables, Registers and Constants

  Variables are simple identifiers, anything that is NOT a register
  or constant is a legal identifier. The syntax will have to change
  a bit to support Perl's non-alpha identifiers, which will probably
  spell renaming the registers.

  You may use an infinite number of typed temporaries.
  S=String, I=Int, N=Float, P=Object(or PMC)

   $S0 = 1
   $S1 = $S0 + 2
   ...

  You may also define lexicals with the .local directive below
  and use them by name.

   .local int i
   .local int j
   i = 4
   j = i * i

  Assigning to a constant is illegal syntax.

Directives

  For this reference, I use the following legend:
    reg = A symbolic temporary ($S0, $I25)
    var = A named variable or symbol (i, foo, myArray)
    IDENTIFIER = An optionally quoted name
    lval = reg | var
    rval = reg | var | const

    lval is allowed as targets of instructions, but not constants,
    so we refer to operands on the right side as rvals. lvals
    are also used on the right hand side to indicate that the
    item can be a const or non-const token.

  These are tokens that might not directly translate to a specific
  opcode, and are intentionly left generic. Some of them aren't
  ops at all, but instructions to the compiler.

   .class <IDENTIFIER>
   .endclass <IDENTIFIER>

   .namespace <IDENTIFIER>
   .endnamespace <IDENTIFIER>

   .sym <type> <IDENTIFIER>

   .sub <IDENTIFIER>
   .end

   .local <type> <IDENTIFIER>

   .arg <lval>

   .return <lval>

   .result <lval>

   .param <type> <lval>

   .emit

   .eom


Instructions

  Assignments, calls, branches, etc. Simplified forms of lower
  level operations. Typically is 1-to-1 or 1-to-2 instruction to
  Parrot ratio.


   lval = rval

   lval = rval + rval

   lval = rval - rval

   lval = rval * rval

   lval = rval / rval

   lval = rval % rval

   lval = rval << rval

   lval = rval >> rval

   <S.lval> = <S.rval> . <S.rval>

  For array and string access you can use:

   <S.lval|P.lval> [ rval ] = rval

   lval = <S.lval|P.lval> [ rval ]

   <P.lval> = new <TYPE>

   call <IDENTIFIER>

   dec <lval>

   inc <lval>

   end

   goto <LABEL>

   if <expression> goto <LABEL>

   print rval

   restoreall

   ret

   saveall

Instructions not known to imcc are looked up in parrot's
op_info_table and must have the proper amount and types of
arguments.


The best way to learn PIR quickly is to study the output
of other compilers that use it (Perl6 and Cola), and
consult the source file imcc.y.



Please mail perl6-internals@perl.org with bug-reports or patches.


Original Author and Maintainer:

Melvin Smith <melvin.smith@mindspring.com>, <melvins@us.ibm.com>

Contributing Authors:

Angel Faus <afaus@corp.vlex.com> ... CFG, life analysis
Sean O'Rourke <seano@cpan.org>   ... anyop, iANY
Leopold Toetsch <lt@toetsch.at>  ... numerous bugfixes/cleanup/rewrite
                                     optimizer.c
				     run parrot code inside imcc
Juergen Boemmels <boemmels@physik.uni-kl.de> Macro preprocessor

