Ok, I just made up the name, because I needed a name for the directory. Yes, it sucks. PLL == python-like-language PxLL == pyrex-like-language Here's the idea: a completely new Python+Pyrex, hopefully implemented mostly in PxLL. Have the underlying API be based on a subset of C++/STL. (A *very* small subset). Let's first characterize the end product, and then lay out a path to get there. 1) find a subset of STL that can be used to implement something like the Python VM. 2) a Pyrex-like compiler/front-end for it. This means designing the pxll language as well. certainly, Pyrex itself makes a good starting point. Try to avoid gratuitous differences, so there might be some level of compatibility? 3) Use refcounting, or GC? Or both, like Python? Perhaps this could be considered a benchmark of the PxLL's success: if even the GC can be written using it. 4) Let's do a quick survey of important types. Note: in general, we want to avoid speed hacks - try to keep things as simple and as close to STL as we can. At least at first. * The Dictionary. An important question about the dictionary - is it used to implement type slots? Or namespaces? Or is it an end-user data structure only? The simplest approach here is to use an stl hash_map. [actually, an even simpler approach would be to use an association list]. An alternative would be to use a - this would get rid of the need for defining 'hashability' on every type, but would impose a partial ordering on all types. Same problem, different color. But it sure would be nice to be able to iterate over sorted dictionaries, wouldn't it? * The List. Probably needs to be a * The Tuple. Probably an immutable ? * The Integer. Damn, it's tempting to make an immediate int. * The Float. object wrapper. * The BigNum. Hmm... probably best left out at first. * The String. Hmmm... probably use the c++ string. Smells like trouble, though. But lots of work already done for us. 5) The VM. My druthers would be to stick to a Scheme-like VM, rather than reimplement the craziness of the Python VM. And of course, stacklessness. Need to think about possible impacts, though. If we do end up actually dis-enabling a C API, then we'll be requiring folks to go through PxLL, which will support coroutines, if not continuations. Will generated C/C++ code be written in continuation-passing style? ================================================================================ The Path. What order do we follow here? And what's our primary implementation language? A bit of a chicken & egg problem - do we do pxll first, or pll? Do we design a VM first, then implement it a compiler for it? Another possibility - we could start with something like the lunacy vm, intending to eventually replace it with one written in pxll. The choices for implementation: 1) the pll compiler: python, eventually (parts) in pxll. 2) the pxll compiler: python, eventually (parts) in pxll. Note that Pyrex is written completely in Python. So we're really just talking about sitting down and writing some Python code. Maybe an interesting approach would be to hand-write a core/sample VM implementation using STL, so that we get a feel for what pxll needs to generate. 0) toy VM written in Python. 1) hard-written VM 'sample/demo', hosted in Python. [similar to early lumberjack] 2) simple expression compiler for this VM. [VM is in a module?]. 3) beginnings of Pyrex/pxll implementation of VM. beginnings of design of 'api'. --- at this point maybe the target language supports integers, variables, and functions. From here, we begin to add sophistication one step at a time to the target language. all other work derives from this. 4) strings, tuples, lists, dictionaries, etc... 5) classes/types (types are already inherent in the design though) 6-n) converge on a final definition of pxll and pll. [...] That timeline kinda hand-waves the development of pxll, though. I think maybe a better starting point is on pxll, following a path similar to the eopl one of progressively adding to the language. ================================================================================ STL vs C. What STL gives us is a ready-made set of datatypes, iterators, and algorithms. So we don't have to write them ourselves. I guess they should be considered the 'primitive' operations of the 'primitive' types. It's really just a short-cut way of getting something running quickly. I'm going to guess that nearly all the STL stuff is going to show up *outside* the core VM - they'll show up as primops. So for example we might have an 'iterator.next' primop, and an 'int.add' primop. ================================================================================ C vs C++ If we're using STL just for the data structures, we don't need to get the whole C++ religion. But if we *did* have the C++ religion, we'd probably want objects & methods to be implemented as c++ objects & methods. What are the pros/cons of this? Myself, I'm very NOT attracted to going c++. But I can see I'd have a hard time arguing with a zealot. Maybe there are other advantages? ================================================================================ Namespaces. There's a real tension between Python's simple original dictionary-based namespace design, and wanting to do it 'right', with closures and lexically bound variables. Clearly, we want to avoid repeating Python's hackish path from the former to the latter. But what problems will we run into by *starting* with lexical variables? Can we get away with disallowing dynamic updates to namespaces? (e.g., adding slots to random objects). Many useful objects in Python are already no-updates-allowed. ================================================================================ Continuations. First off - I'm really leaning toward *not* having a user-visible C API. In other words, the official extension language is pxll. If you want to access C stuff from it, you *have* to use pxll. This gives us considerable freedom underneath the covers. (the unofficial position will be, for a particular version of the implementation, you're welcome to try to figure out how to do it. good luck.). I really want the VM itself to be stackless. As we know, this implies that C code compiled against it must be written in continuation-passing mode. So, the pxll compiler must generation CPS output. How hard his this? We need to come up with a really simple example function, and show what the generated code might look like, and how it interacts with the VM. ================================================================================ Starting on PXLL. This will be similar to the path in EOPL, starting with a simple expression language and adding to it. SO, step zero is to get a parser going. Sigh. Or I could do lisp. But we really want a parser. But lisp would let us get to the experiment quicker. So let's say we start with a lisp syntax. Add a pyrex-like syntax later. Keep the pyrex syntax in mind while working. [...] Really need to understand exactly what language Pyrex defines, because it's quite interesting. It consists of *python*, augmented with c variable declarations and types. Its most important features are the automatic type conversion, and the transparent handling of refcounts and exceptions. ================================================================================ PXLL specifics. Ok, now that I've started on 'compile_exp', I see there are some unanswered questions. 8^) 1) storage model. [gc'd heap?] 2) register model. [four-register? n-register? c variables?] 3) code model. [many functions? one giant function? CPS?] The temptation here is to put everything in one giant function. Pro: gcc can theoretically optimize the hell out of it. Con: doesn't address how outside code might interact with it? [do we not want to *allow* that? i.e., all code must compile into the VM? Seems pretty restrictive. can we ignore this issue for now?] Actually, this is an interesting way to think about 'compilers'. Given a lit-var-app-proc-set! compiler, what parameters are needed to 'actualize' it? Can we write a 'top-level' compiler with any intelligence at all? What 'job' does it do? ================================================================================ operands we need two (or more?) different kinds of compile_rands(). 1) we need a 'register' kind of compile_rands() 2) a tuple-args version - for normal funcalls that extend an environment. 3) a 'dumb' version, that simply concats the strings of arguments together? (for example, "(%+ a (%+ b (%prim0 c) (%prim1 d e)))") SO: 'primops' are C functions or operators. They are called using the C calling convention. But what happens here: (%+ (fun0 a b) (fun1 d e)) In this case, we need to do normal funcalls, that will need save/restore etc... Where do we accumulate the arguments? Another issue - the *types* of the arguments. Here, presumably fun0 will return an , but %+ expects an integer. So there will have to be a conversion step: (%+ (ob->int (fun0 a b)) (ob->int (fun1 d e))) (Hmmm.. is the conversion function built into the pxll compiler, or is it a primop?) One temptation is to introduce a C scope: { int _temp0 = fun0 (a, b); ... } But this can't actually work, because fun0 is not a C function. The calling of fun0 will actually be a 'goto'. SO, we really do need something like the register compiler, even though we're going to be storing into temps? Will we collect all temps of all types, and factor them out? For example, will we discover that there are a maximum of 12 integer temps needed, and declare them that way? Then 7 object temps... What about pointers to other types... [...] Spent some time staring at scheme48. They've definitely done some sophisticated stuff. Looks like they split primitives into 'trivial' and 'non-trivial', depending on whether they need continuations. We may want to do a similar analysis, but maybe let's put it off until later. For now, let's just do the dumb thing. For example, "(a+b)-c" - I'm not sure that a modern C compiler will see that any differernt from the version using a temporary. [...] Just had a thought. If we end up using a 'register-like' model for temporaries, then it's possible that we can rearrange arguments to avoid save/restore sets? For example, in "(%+ 3 (fun0 x))" we might do the funcall first, so we don't have to save/restore the '3', making it probably 10x more complicated.