-----------------------------------------------------------------------------------
--- Where the Curious User is taken on a Quick Tour of the Compilation Process. ---
-----------------------------------------------------------------------------------

This is the kind of document that is usually missing from any language
 implementation, leaving the hapless source-browser completely
 bewildered, lost in a directory containing code that isn't even used
 anymore...  I hope this helps!

[Ok, well now this document is really old.
 1) the compiler is no longer written in python.
 2) it's now for a VM, not x86.
 But lots of the info here still holds, and is interesting. Enjoy]

-----------------------------------------------------------------------------------

The user types in:

(let ((x 1) (y 2))
  (+ x y))

1) READ -------------------- (lisp_reader.py)

The lisp reader parses this into the following annotated
data structure:

[('symbol', 'let'),
 [[('symbol', 'x'), ('lit', ('int', 1))],
  [('symbol', 'y'), ('lit', ('int', 2))]],
 [('symbol', '+'), ('symbol', 'x'), ('symbol', 'y')]]

Notice how each object in the list is a tuple of (type, value).
[the only exception to this is the 'list' datatype, which is implied
 by the python list]

This phase is usually called 'lexical analysis', though it's so simple
with lisp that one would hesitate to use such a hefty term.

2) PARSE -------------------- (lisp_parser.py)

The expression is then fed to the parser, which picks out the 'special
forms' of scheme, effectively rewiting the expression as an abstract
syntax tree (AST):

['let',
 [['decl', 'x', ('lit', ('int', 1))], ['decl', 'y', ('lit', ('int', 2))]],
 [['app', ['varref', '+'], [['varref', 'x'], ['varref', 'y']]]]]


3) EXPAND -------------------- (transform.py)

The AST is now run through the expression transformer, which rewrites
complex expressions in terms of the simpler ones understood by the
compiler.

['app',
 ['proc',
  ['x', 'y'],
  ['app', ['varref', '+'], [['varref', 'x'], ['varref', 'y']]]],
 [('lit', ('int', 1)), ('lit', ('int', 2))]]

This is equivalent to rewriting the input as '((lambda (x y) (+ x y)) 1 2)'.

4) COMPILE -------------------- (compiler.py)

In this stage the primitive expression is compiled for an abstract
register machine by rewriting the expression in 'continuation-passing style',
or CPS.  This in effect 'unwinds' the expression, by delivering the results
of each primitive operation to a register; the CPS version of the program
is thus a sequential 'plan' for the execution of the expression:

['close',
 ['extend_env',
  2,
  'edx',
  'ecx',
  ['cont',
   'ecx',
   {},
   ['varref',
    (1, 60),
    'ecx',
    ['cont',
     0,
     {'ebx'},
     ['new_tuple',
      2,
      ['cont',
       1,
       {'ebx', 0},
       ['varref',
        (0, 0),
        'ecx',
        ['cont',
         2,
         {'ebx', 0, 1},
         ['tuple_store',
          2,
          1,
          0,
          ['cont',
           1,
           {'ebx', 0, 1},
           ['varref',
            (0, 1),
            'ecx',
            ['cont',
             2,
             {'ebx', 0, 1},
             ['tuple_store',
              2,
              1,
              1,
              ['cont',
               1,
               {'ebx', 0, 1},
               ['tail_apply_closure', 0, 1, 0]]]]]]]]]]]]]]],
 'ecx',
 ['cont',
  0,
  {'ebx'},
  ['new_tuple',
   2,
   ['cont',
    1,
    {'ebx', 0},
    ['literal',
     ('int', 1),
     ['cont',
      2,
      {'ebx', 0, 1},
      ['tuple_store',
       2,
       1,
       0,
       ['cont',
        1,
        {'ebx', 0, 1},
        ['literal',
         ('int', 2),
         ['cont',
          2,
          {'ebx', 0, 1},
          ['tuple_store',
           2,
           1,
           1,
           ['cont',
            1,
            {'ebx', 0, 1},
            ['tail_apply_closure', 0, 1, 0]]]]]]]]]]]]]

Each continuation (identifiable by the 'cont' tag at the head of each
list) represents 'the rest of the computation'.  Each continuation
'record' (I use the term loosely here) consists of a target register,
an object representing the list of used registers (they're actually
called 'free' registers, but that can get awfully confusing), and
finally another CPS expression representing the rest of the
computation.

This (admittedly monstrous) expression is then 'flattened' into a
nearly-linear list of instructions:

[['close',
  [['extend_env', 2, 'edx', 'ecx', 'ecx'],
   ['varref', (1, 60), 'ecx', 0],
   ['new_tuple', 2, 1],
   ['varref', (0, 0), 'ecx', 2],
   ['tuple_store', 2, 1, 0, 1],
   ['varref', (0, 1), 'ecx', 2],
   ['tuple_store', 2, 1, 1, 1],
   ['tail_apply_closure', 0, 1, 0]],
  'ecx',
  0],
 ['new_tuple', 2, 1],
 ['literal', ('int', 1), 2],
 ['tuple_store', 2, 1, 0, 1],
 ['literal', ('int', 2), 2],
 ['tuple_store', 2, 1, 1, 1],
 ['tail_apply_closure', 0, 1, 0]]

5) CODE GENERATION -------------------- (i386.py)

At this point we're not far away from machine code.  The flattened
list of abstract instructions is then fed through a simple code
generator, that simply translates each instruction into a (usually)
fixed bit of Intel assembly... This is actually the most complex part
of the compiler.

The assembly phase is somewhat complicated by our need to generate
COMPLETELY position-independent code - at runtime the garbage
collector must be able to unpackage any pointer on the heap (including
return addresses, which must carry along an offset to their code
objects).

Since even code objects can be put on the heap, they do not have fixed
addresses.  Thus we are often referencing the instruction pointer
(%EIP), which on the i386 can only be done with a CALL/POP
sequence... The CALL insn places a return address on the stack, which
is then popped off into a real register.

        # ------------------------------------------------------------  close -> r0
        movl $0x0000030c, (%esi)       # c7 06 0c 03 00 00
        movl %ecx, 12(%esi)            # 89 4e 0c
        call over1-entry1              # e8 72 00 00 00
entry1:
        # ------------------------------------------------------------  ----- begin procedure -----
        # ------------------------------------------------------------  extend_env
        movl %ecx, 4(%edx)             # 89 4a 04
        movl %edx, %ecx                # 89 d1
        # ------------------------------------------------------------  check heap
        cmpl %esi, %edi                # 39 f7
        jg 11                          # 7f 0b
        pushl gc_return1-Lcode         # 68 2a 00 00 00
        call 144(%ebp)                 # ff 95 90 00 00 00
gc_return1:
        # ------------------------------------------------------------  varref (1,60) -> r0
        movl 4(%ecx), %eax             # 8b 41 04
        movl 248(%eax), %eax           # 8b 80 f8 00 00 00
        movl %eax, 0(%ebp)             # 89 45 00
        # ------------------------------------------------------------  new_tuple(2) -> r1
        movl %esi, 4(%ebp)             # 89 75 04
        movl $0x00000308, (%esi)       # c7 06 08 03 00 00
        xorl %eax, %eax                # 31 c0
        movl %eax, 4(%esi)             # 89 46 04
        movl %eax, 8(%esi)             # 89 46 08
        movl %eax, 12(%esi)            # 89 46 0c
        addl $16, %esi                 # 83 c6 10
        # ------------------------------------------------------------  varref (0,0) -> r2
        movl 8(%ecx), %eax             # 8b 81 08 00 00 00
        movl %eax, 8(%ebp)             # 89 45 08
        # ------------------------------------------------------------  tuple_store r2 -> r1[1]
        movl 4(%ebp), %eax             # 8b 45 04
        movl 8(%ebp), %edx             # 8b 55 08
        movl %edx, 8(%eax)             # 89 90 08 00 00 00
        # ------------------------------------------------------------  varref (0,1) -> r2
        movl 12(%ecx), %eax            # 8b 81 0c 00 00 00
        movl %eax, 8(%ebp)             # 89 45 08
        # ------------------------------------------------------------  tuple_store r2 -> r1[2]
        movl 4(%ebp), %eax             # 8b 45 04
        movl 8(%ebp), %edx             # 8b 55 08
        movl %edx, 12(%eax)            # 89 90 0c 00 00 00
        # ------------------------------------------------------------  tail_apply_closure r0(r1) -> r0
        movl 0(%ebp), %edx             # 8b 55 00
        movl 12(%edx), %ecx            # 8b 4a 0c
        movl 4(%edx), %eax             # 8b 42 04
        addl 8(%edx), %eax             # 03 42 08
        movl 4(%ebp), %edx             # 8b 55 04
        jmp *%eax                      # ff e0
        # ------------------------------------------------------------  ----- end procedure -----
over1:
        popl 4(%esi)                   # 8f 46 04
        movl entry1-Lcode, %eax        # b8 16 00 00 00
        subl %eax, 4(%esi)             # 29 46 04
        movl %eax, 8(%esi)             # 89 46 08
        movl %esi, 0(%ebp)             # 89 75 00
        addl $0x10, %esi               # 83 c6 10
        # ------------------------------------------------------------  new_tuple(2) -> r1
        movl %esi, 4(%ebp)             # 89 75 04
        movl $0x00000308, (%esi)       # c7 06 08 03 00 00
        xorl %eax, %eax                # 31 c0
        movl %eax, 4(%esi)             # 89 46 04
        movl %eax, 8(%esi)             # 89 46 08
        movl %eax, 12(%esi)            # 89 46 0c
        addl $16, %esi                 # 83 c6 10
        # ------------------------------------------------------------  literal 1 -> r2
        movl $3, 8(%ebp)               # c7 45 08 03 00 00 00
        # ------------------------------------------------------------  tuple_store r2 -> r1[1]
        movl 4(%ebp), %eax             # 8b 45 04
        movl 8(%ebp), %edx             # 8b 55 08
        movl %edx, 8(%eax)             # 89 90 08 00 00 00
        # ------------------------------------------------------------  literal 2 -> r2
        movl $5, 8(%ebp)               # c7 45 08 05 00 00 00
        # ------------------------------------------------------------  tuple_store r2 -> r1[2]
        movl 4(%ebp), %eax             # 8b 45 04
        movl 8(%ebp), %edx             # 8b 55 08
        movl %edx, 12(%eax)            # 89 90 0c 00 00 00
        # ------------------------------------------------------------  tail_apply_closure r0(r1) -> r0
        movl 0(%ebp), %edx             # 8b 55 00
        movl 12(%edx), %ecx            # 8b 4a 0c
        movl 4(%edx), %eax             # 8b 42 04
        addl 8(%edx), %eax             # 03 42 08
        movl 4(%ebp), %edx             # 8b 55 04
        jmp *%eax                      # ff e0

Yes, that's a lot of code for such a simple expression.  Later on (after
this is rewritten in Scheme so that it can be self-hosted) I plan to add
some simple optimizations, which would:

  1) beta-reduce
       ((lambda (x y) (+ x y)) 1 2)
       =>
       (+ 1 2)

  2) constant-fold
       (+ 1 2)
       =>
       3

And that would of course be a much smaller bit of code.

The most important remaining optimization for the compiler is the
translation of eligible tail calls to 'imperative form', which will
make things much faster - currently we _always_ generate argument-list
tuples (even for an empty argument list!), so we're filling up the
heap ferociously.  This would allow, for example, a factorial function
that didn't touch the heap at all.

If you're feeling especially curious, you might like to know how the
Very Tiny Register Set is used:

%eax: scratch
%edx: scratch
%esi: free pointer (top of the heap)
%edi: limit (end of the heap)
%ecx: environment (a linked list of variable bindings)
%ebx: control information, usually called the 'stack', though ours isn't one.
%esp: stack pointer, we keep one return address on top of it.
%ebp: pseudo-register base.

Since we have essentially NO real registers to work with, we allocate
a file of 64 registers, which will be accessed using 8-bit indexed
movl insns by the compiler. (such insns are only 3 bytes long).  These
are the register numbers referred to by the compiled code above.