# [big discontinuity between thoughts.txt and this] I've finally started working on pxll in april 2008. the basic compiler structure is there, looks like continuations can work with the 'giant c function' model. a few questions remain: ================================================== * can calls be made between functions? for example, if a continuation is saved for vm1(), can we jump to it from vm2()? [this will be necessary if we *ever* want to have more than one file or function, which seems likely. as beautiful as the original notion might be] ================================================== * must remember the *purpose* of this compiler. it's not necessarily going to be an implementation language of choice. its purpose is to facilitate the writing of a VM! So exposing the fine-grained nastiness of C is going to help that. Tagged integers etc may not be what we want. HOWEVER, we need to think clearly on this. It seems likely that the VM will share heap+gc, so we may need to distinguish between objects that will live in the heap and those that will not (if any). ================================================== * safety/typechecking/etc this is where I always screw up on these projects. I'm always flying by the skin of my teeth, and spend *way* too much time chasing down stupid bugs that would have been caught early by some type of checking. Since we're generating C, it should be *trivial* to have code that checks these things, and it should be *trivial* to turn it off when we want to know how fast it can go. ================================================== * 'runtime'. The 'runtime' of pxll will consist of routines useful for implementing the VM. ================================================== * GC. will we be able to write the collector *in* pxll? Hmmm.... this could be difficult. Perhaps this is the ultimate challenge for PXLL, because it would require no stack/environment at all... possibly a variant of PXLL would use the C stack? Well, looking carefully at the cheney copying loop, it of course does not use recursion. A fun challenge! [my guess is that any initial collector will be written by hand] ================================================== * known vs unknown functions. For the purposes of this VM, *all* functions should be known. Therefore we actually should *not* need the &&label feature of gcc. [NB: true except for return labels - i.e. return address stored in SAVE tuples] [Note: this is not true - functions referenced outside the operator position are unknown (i.e., functional programming) ] ================================================== * determine if we can make cross-function calls This could be tricky. If we keep the 'globals' in registers, we'll take a performance hit. If we load them up at entry, and restore them at exit, it might could work. But at a heavy cost. It's possible that we'll have to choose: a) a high-performance, all code in one function b) a lower-performance, allow multiple modules [The only true escape would be to use native code generation] Another approach: use gcc's global register decl, therefore stop using lexical functions, etc... ================================================== evaluation order. I was bothered by the output from (cons 1 (cons 2 (cons 3 ...))), it consumed registers and saved all the operations for the end. As a test I parameterized the eval order of both primops and funcalls... the results were strange, at least with nqueens. The best performance came from left->right. Perhaps the above example is unrealistic? At some point run many benchmarks. Better yet, see if a complex-args-first model works better in general. [yes, this fixed the problem completely] ================================================== INDENT/DEDENT Most implementations of a python lexer include an extra level of state management to handle the synthetic indent/dedent tokens. I have a 'feeling' that using a generator I might be able to make that a trivial wrapper around the token stream - using local variables... [yes, this works just fine] ================================================== binding order. The yin/yang pitfall test exposes a possible issue w.r.t. the exact order of binding operations. It will only succeed if the env rib allocation is held off until the last possible moment (i.e., after the call/cc). Not sure if this is important. ================================================== passing arguments in registers. This should be relatively easy for leaf procedures. The main difficulty is that we'll have to teach the gc about those extra roots, which isn't much fun. crazy idea about passing arguments in registers. While trying to imagine a way of merging the three binding constructs, it occurs to me that we could make function definitions look like a non-binding construct with a let* immediately nested inside it that is automagically populated with the arguments? passing args via registers would make this a more natural thing? hmmm... or perhaps pointless, unless we started to talk about making bindings map to registers again... uff da... ================================================== noticed a problem with inlining and typechecks. inlining a small function will consider its args either 'simple', or 'complex'. if a complex arg is used only once, it will be considered simple, and will thus avoid a let* to hold the arg. however, %%verify calls frequently introduce extra references to args that make nearly all small functions impossible to efficiently inline. This is especially rough when trying to avoid allocations inside a loop. possible solutions: 1) if we could be smarter about removing (or moving) type checks in such a way as to avoid let* in loops... 2) type inference! 3) let-register 4) when inlining a function, treat %%verify specially - more like a type declaration. 5) use type declarations rather than %%verify. ================================================== TASKS. * simple optional manual typing. [in progress] * flatten lexical contours. This kinda thing: r[2] = ((object *******) lenv) [1][1][1][1][1][1][14]; is a little ridiculous. I believe this should not be too hard. Since closures can't escape, it should be possible to raise/lower ('hoist') the binding position. it should be possible to lift every variable bound inside a function to the top of it. stupidly, you could just declare them all... *or*, you could find a minimal set, like with registers, so that otherwise-non-overlapping bindings could share a slot. * some kind of nary make-tuple primitive that will build a tuple with a particular typecode and fill it in with the other args. it would probably have to look like %%cexp? * raise inner functions to the top when they do not close over any variables... this avoids creating unneccesary closures inside loops, for example. * fix the 'copied duck' sentinel when used with zero-length tuples. * consider replacing with a core construct. why? because cascaded still results in a bunch of r[0]=#f stores followed by if (r[0] == #f) tests... these are *not* getting optimized away! * consider writing an construct, it might make a big difference with things like the lexer, which (when inlined) make repeated references to lenv[1][1][1][1][1][2], which might compile better as a switch? dunno. * why does tak20 crash on amd64? [probably a gcc bug, -O2 makes it go away] need to get a fresher gcc on dark! Urgh, need to look into this - seems too fishy - most optimized tak20 crashes. is something wrong? * when calling a known function, only save registers that it might clobber. * file object * support [list] constants? * move all verify() calls into C macros and always emit them. then use -DSAFETY=x (actually, the optimizer should be able to remove those calls with constant ) * ideas from rabbit: * literals as core values, and hence not needing a continuation. would this be worth doing? * add command-line argument for safety checks * lambda-lifting/closure-conversion - I believe this requires known functions. * define an important VM task & the performance we demand (e.g., parsing) how about reading a large file? for this we need buffers, strings, etc. not to mention, access to open/close/read/write. * add syntactical environments to the compiler * box in compiler, not code generator? * determine if we can make cross-function calls * Note: with the globals in registers, this *can't* work! * Note: gcc can put C globals in registers. * describe some features/funs of VM-runtime * calling external C functions - describing their type signatures * begin a toy VM in order to flush out issues * save/load - much fun to be had here - a good demo as well DONE. * when accessing the topmost environment is a common activity, store it in a variable, and detect it. * consider re-arranging arguments depending on complexity. looks like register pressure is much lighter when you eval complex args earlier rather than later. (e.g., (cons 1 (cons 2 (cons 3 ...)))) * implement variable-arity functions (i.e., (lambda x ...) how to do this: quietly convert a dot-vararg into a special name, and then require a form like "vararg[n]" to access each element of the environment rib. at first we should allow no fixed args, and then later add support for them, adjusting the offset of the tuple references. * would declaring r0, r1, etc.. make much of a difference? [no: g5: -verbose-gc, tak20=0.096u : yes: 0.098u yeah gcc!] * fix call/cc: ok, this actually for the most part works again (after removing 'register' inlining). However, in the test fun call/cc gets inlined, and so it actually fails. This is a separate issue, but still need to come up with a test that works. * (bor ...) => simple boolean implementation, rather than the scheme one which *requires* a binding in order to return the first true value. * ( ...) => eval rator *last*, otherwise it pointlessly consumes a register. * fix the 'include' special form * added and then removed code to track line numbers. it made the transformer *much* too complicated. Needs to be replaced with an automated mechanism, something like macro-by-example. * identify functions that allocate, use check_heap() at their top. * INSN class for CPS * flattened CPS representation * inline small functions. this should be pretty easy too. * implement FIX, and maybe LET. would just be . * split into multiple modules * switch to real data structures * identify tail recursion and turn it into goto (or, use registers for argument passing) * known vs unknown functions * measure the cost of consing loops vs non-consing loops (i.e., what would stack allocation be worth, or args in registers?) [ok, what I did was special-case tail recursion. a loop that counts down from 1 million takes about 8 million cycles to complete] * optimization: ((lambda (x) ...) y) => (extend-env-and-goto ...) [this should probably be redone as a straightforward beta reduction] * move globals into vm() so they can go into registers * GC [wow, that was easy] * rewrite as register compiler * add an 'inline' capability? ('cexp') * figure out primops/model-c-types/c-expressions * mockup what we think we need, then work through steps needed to get there. * fix gc to know about free regs * verify that getcc/putcc work correctly * add string datatype * rdtsc() support, measure gc overhead benchmarking [need to be more organized about this - compiler is changing so radically that I might easily miss a huge hit to performance] [tempted to start putting up numbers, but since I'm likely to replace this g5 soon...]