----------------------------------------------------------------------------------- --- Where the Curious User is taken on a Quick Tour of the Compilation Process. --- ----------------------------------------------------------------------------------- This is the kind of document that is usually missing from any language implementation, leaving the hapless source-browser completely bewildered, lost in a directory containing code that isn't even used anymore... I hope this helps! [Ok, well now this document is really old. 1) the compiler is no longer written in python. 2) it's now for a VM, not x86. But lots of the info here still holds, and is interesting. Enjoy] ----------------------------------------------------------------------------------- The user types in: (let ((x 1) (y 2)) (+ x y)) 1) READ -------------------- (lisp_reader.py) The lisp reader parses this into the following annotated data structure: [('symbol', 'let'), [[('symbol', 'x'), ('lit', ('int', 1))], [('symbol', 'y'), ('lit', ('int', 2))]], [('symbol', '+'), ('symbol', 'x'), ('symbol', 'y')]] Notice how each object in the list is a tuple of (type, value). [the only exception to this is the 'list' datatype, which is implied by the python list] This phase is usually called 'lexical analysis', though it's so simple with lisp that one would hesitate to use such a hefty term. 2) PARSE -------------------- (lisp_parser.py) The expression is then fed to the parser, which picks out the 'special forms' of scheme, effectively rewiting the expression as an abstract syntax tree (AST): ['let', [['decl', 'x', ('lit', ('int', 1))], ['decl', 'y', ('lit', ('int', 2))]], [['app', ['varref', '+'], [['varref', 'x'], ['varref', 'y']]]]] 3) EXPAND -------------------- (transform.py) The AST is now run through the expression transformer, which rewrites complex expressions in terms of the simpler ones understood by the compiler. ['app', ['proc', ['x', 'y'], ['app', ['varref', '+'], [['varref', 'x'], ['varref', 'y']]]], [('lit', ('int', 1)), ('lit', ('int', 2))]] This is equivalent to rewriting the input as '((lambda (x y) (+ x y)) 1 2)'. 4) COMPILE -------------------- (compiler.py) In this stage the primitive expression is compiled for an abstract register machine by rewriting the expression in 'continuation-passing style', or CPS. This in effect 'unwinds' the expression, by delivering the results of each primitive operation to a register; the CPS version of the program is thus a sequential 'plan' for the execution of the expression: ['close', ['extend_env', 2, 'edx', 'ecx', ['cont', 'ecx', {}, ['varref', (1, 60), 'ecx', ['cont', 0, {'ebx'}, ['new_tuple', 2, ['cont', 1, {'ebx', 0}, ['varref', (0, 0), 'ecx', ['cont', 2, {'ebx', 0, 1}, ['tuple_store', 2, 1, 0, ['cont', 1, {'ebx', 0, 1}, ['varref', (0, 1), 'ecx', ['cont', 2, {'ebx', 0, 1}, ['tuple_store', 2, 1, 1, ['cont', 1, {'ebx', 0, 1}, ['tail_apply_closure', 0, 1, 0]]]]]]]]]]]]]]], 'ecx', ['cont', 0, {'ebx'}, ['new_tuple', 2, ['cont', 1, {'ebx', 0}, ['literal', ('int', 1), ['cont', 2, {'ebx', 0, 1}, ['tuple_store', 2, 1, 0, ['cont', 1, {'ebx', 0, 1}, ['literal', ('int', 2), ['cont', 2, {'ebx', 0, 1}, ['tuple_store', 2, 1, 1, ['cont', 1, {'ebx', 0, 1}, ['tail_apply_closure', 0, 1, 0]]]]]]]]]]]]] Each continuation (identifiable by the 'cont' tag at the head of each list) represents 'the rest of the computation'. Each continuation 'record' (I use the term loosely here) consists of a target register, an object representing the list of used registers (they're actually called 'free' registers, but that can get awfully confusing), and finally another CPS expression representing the rest of the computation. This (admittedly monstrous) expression is then 'flattened' into a nearly-linear list of instructions: [['close', [['extend_env', 2, 'edx', 'ecx', 'ecx'], ['varref', (1, 60), 'ecx', 0], ['new_tuple', 2, 1], ['varref', (0, 0), 'ecx', 2], ['tuple_store', 2, 1, 0, 1], ['varref', (0, 1), 'ecx', 2], ['tuple_store', 2, 1, 1, 1], ['tail_apply_closure', 0, 1, 0]], 'ecx', 0], ['new_tuple', 2, 1], ['literal', ('int', 1), 2], ['tuple_store', 2, 1, 0, 1], ['literal', ('int', 2), 2], ['tuple_store', 2, 1, 1, 1], ['tail_apply_closure', 0, 1, 0]] 5) CODE GENERATION -------------------- (i386.py) At this point we're not far away from machine code. The flattened list of abstract instructions is then fed through a simple code generator, that simply translates each instruction into a (usually) fixed bit of Intel assembly... This is actually the most complex part of the compiler. The assembly phase is somewhat complicated by our need to generate COMPLETELY position-independent code - at runtime the garbage collector must be able to unpackage any pointer on the heap (including return addresses, which must carry along an offset to their code objects). Since even code objects can be put on the heap, they do not have fixed addresses. Thus we are often referencing the instruction pointer (%EIP), which on the i386 can only be done with a CALL/POP sequence... The CALL insn places a return address on the stack, which is then popped off into a real register. # ------------------------------------------------------------ close -> r0 movl $0x0000030c, (%esi) # c7 06 0c 03 00 00 movl %ecx, 12(%esi) # 89 4e 0c call over1-entry1 # e8 72 00 00 00 entry1: # ------------------------------------------------------------ ----- begin procedure ----- # ------------------------------------------------------------ extend_env movl %ecx, 4(%edx) # 89 4a 04 movl %edx, %ecx # 89 d1 # ------------------------------------------------------------ check heap cmpl %esi, %edi # 39 f7 jg 11 # 7f 0b pushl gc_return1-Lcode # 68 2a 00 00 00 call 144(%ebp) # ff 95 90 00 00 00 gc_return1: # ------------------------------------------------------------ varref (1,60) -> r0 movl 4(%ecx), %eax # 8b 41 04 movl 248(%eax), %eax # 8b 80 f8 00 00 00 movl %eax, 0(%ebp) # 89 45 00 # ------------------------------------------------------------ new_tuple(2) -> r1 movl %esi, 4(%ebp) # 89 75 04 movl $0x00000308, (%esi) # c7 06 08 03 00 00 xorl %eax, %eax # 31 c0 movl %eax, 4(%esi) # 89 46 04 movl %eax, 8(%esi) # 89 46 08 movl %eax, 12(%esi) # 89 46 0c addl $16, %esi # 83 c6 10 # ------------------------------------------------------------ varref (0,0) -> r2 movl 8(%ecx), %eax # 8b 81 08 00 00 00 movl %eax, 8(%ebp) # 89 45 08 # ------------------------------------------------------------ tuple_store r2 -> r1[1] movl 4(%ebp), %eax # 8b 45 04 movl 8(%ebp), %edx # 8b 55 08 movl %edx, 8(%eax) # 89 90 08 00 00 00 # ------------------------------------------------------------ varref (0,1) -> r2 movl 12(%ecx), %eax # 8b 81 0c 00 00 00 movl %eax, 8(%ebp) # 89 45 08 # ------------------------------------------------------------ tuple_store r2 -> r1[2] movl 4(%ebp), %eax # 8b 45 04 movl 8(%ebp), %edx # 8b 55 08 movl %edx, 12(%eax) # 89 90 0c 00 00 00 # ------------------------------------------------------------ tail_apply_closure r0(r1) -> r0 movl 0(%ebp), %edx # 8b 55 00 movl 12(%edx), %ecx # 8b 4a 0c movl 4(%edx), %eax # 8b 42 04 addl 8(%edx), %eax # 03 42 08 movl 4(%ebp), %edx # 8b 55 04 jmp *%eax # ff e0 # ------------------------------------------------------------ ----- end procedure ----- over1: popl 4(%esi) # 8f 46 04 movl entry1-Lcode, %eax # b8 16 00 00 00 subl %eax, 4(%esi) # 29 46 04 movl %eax, 8(%esi) # 89 46 08 movl %esi, 0(%ebp) # 89 75 00 addl $0x10, %esi # 83 c6 10 # ------------------------------------------------------------ new_tuple(2) -> r1 movl %esi, 4(%ebp) # 89 75 04 movl $0x00000308, (%esi) # c7 06 08 03 00 00 xorl %eax, %eax # 31 c0 movl %eax, 4(%esi) # 89 46 04 movl %eax, 8(%esi) # 89 46 08 movl %eax, 12(%esi) # 89 46 0c addl $16, %esi # 83 c6 10 # ------------------------------------------------------------ literal 1 -> r2 movl $3, 8(%ebp) # c7 45 08 03 00 00 00 # ------------------------------------------------------------ tuple_store r2 -> r1[1] movl 4(%ebp), %eax # 8b 45 04 movl 8(%ebp), %edx # 8b 55 08 movl %edx, 8(%eax) # 89 90 08 00 00 00 # ------------------------------------------------------------ literal 2 -> r2 movl $5, 8(%ebp) # c7 45 08 05 00 00 00 # ------------------------------------------------------------ tuple_store r2 -> r1[2] movl 4(%ebp), %eax # 8b 45 04 movl 8(%ebp), %edx # 8b 55 08 movl %edx, 12(%eax) # 89 90 0c 00 00 00 # ------------------------------------------------------------ tail_apply_closure r0(r1) -> r0 movl 0(%ebp), %edx # 8b 55 00 movl 12(%edx), %ecx # 8b 4a 0c movl 4(%edx), %eax # 8b 42 04 addl 8(%edx), %eax # 03 42 08 movl 4(%ebp), %edx # 8b 55 04 jmp *%eax # ff e0 Yes, that's a lot of code for such a simple expression. Later on (after this is rewritten in Scheme so that it can be self-hosted) I plan to add some simple optimizations, which would: 1) beta-reduce ((lambda (x y) (+ x y)) 1 2) => (+ 1 2) 2) constant-fold (+ 1 2) => 3 And that would of course be a much smaller bit of code. The most important remaining optimization for the compiler is the translation of eligible tail calls to 'imperative form', which will make things much faster - currently we _always_ generate argument-list tuples (even for an empty argument list!), so we're filling up the heap ferociously. This would allow, for example, a factorial function that didn't touch the heap at all. If you're feeling especially curious, you might like to know how the Very Tiny Register Set is used: %eax: scratch %edx: scratch %esi: free pointer (top of the heap) %edi: limit (end of the heap) %ecx: environment (a linked list of variable bindings) %ebx: control information, usually called the 'stack', though ours isn't one. %esp: stack pointer, we keep one return address on top of it. %ebp: pseudo-register base. Since we have essentially NO real registers to work with, we allocate a file of 64 registers, which will be accessed using 8-bit indexed movl insns by the compiler. (such insns are only 3 bytes long). These are the register numbers referred to by the compiled code above.