Irken

The Long-Term Goal

A new dialect and implementation of a python-like language that supports massively scalable cooperative multi-threading, and eventually, erlang-style concurrency.

How

A VM that supports continuations. 'threads' will be built using continuations. The VM will be written in a new high-level language 'Irken' that compiles to C. Continuations will be available in both the target language and the implementation language, and since we want to support millions of threads, we cannot use the C stack. Our approach is to generate continuation-passing style (CPS) code in C.

The Plan

Implement the Irken compiler.
Write a VM for the python-like language in Irken.
Rewrite the irken compiler in the python-like language.
??
Profit!

Currently, I'm in stage 2 - though about 90% of the work is still in stage 1.

[2012] Actually, I skipped a few steps and rewrote the Irken compiler in Irken.

Python-like Language ('PLL')

This should look as much like Python as possible, though it will start life as a much smaller language. My current thinking is that it will be dynamically typed (just like Python), though I may experiment with some kind of type inference. PLL will use a VM, implemented in Irken. Yes, I need a better name than 'PLL'. Originally, I had planned to call the two languages 'Irken-Low' and 'Irken-High', but this would just be confusing since they have almost nothing in common. Also, I believe Irken will be useful on its own.

Irken

Currently, this is a strange hybrid of Scheme and ML/Haskell - although it looks like Scheme, it uses ML-style Hindley-Milner let-polymorphism, with algebraic datatypes. The lisp syntax was chosen for reasons of simplicity and flexibility. Once PLL is available, I may think about giving Irken a pythonic syntax. The main feature missing from the ML/Haskell world is full pattern matching, though a simpler variant-case syntax works pretty well.

Irken compiles to a single large function in a single C file. Function calls are implemented as 'gotos' within this function. A simple register-machine model is used.

I may consider a second backend target to LLVM, although LLVM's thin support for GC will be useless without a stack.

Concurrency: Solve Two Problems, Create One.

Two problems I hope to solve using Erlang-style isolated 'threads':

First, garbage collection. If every thread has its own heap, garbage collection will be naturally 'concurrent' - i.e., we'll avoid having one gigantic collection that sweeps 12GB of memory..

Second, modules / external code. The runtime model of Irken makes it very difficult to link in new compiled code at run-time. If we instead demand erlang-style isolation, it's not a problem - just load up a new 'VM' and run it in a separate process/thread.

IPC: the price to pay - now we need a very efficient way of communicating between process/threads. But we needed that anyway.

Platforms

My current testing platforms include FreeBSD/amd64, FreeBSD/i386, OSX/G5, both 32 and 64-bit. Irken should work on any platform that supports gcc (or any other C compiler that supports the address-of-label extension, which most do...) and some kind of rdtsc-like facility - though if you were desperate you could just strip out the tsc stuff - it's only used for timings.

[Note: looks like clang is out - at least on Snow Leopard the address-of-label feature is silently broken. Also seems to not like lexical functions.]

License

See LICENSE.txt. It's a Simplified BSD License.

The Python code

[2012 note: now that the compiler is self-hosted, you will find these files written in Irken, in the 'self' directory, with '.scm' extensions.

compile.py: drives everything.
lisp_reader.py: reads scheme forms into an s-expression.
input: program
output: s-exp
transform.py: converts all derived scheme expressions into the core understood by the compiler.
input: s-exp
output: s-exp
nodes.py: convert the s-expression into a tree of nodes.
input: s-exp
output: node tree
typing.py: hindley-milner polymorphic type inference [obsolete]
solver.py: constraint-based type inference, based on ATTPL Chapter 10
analyze.py: alpha conversion, tree shaker, inliner, some simplification
input: node tree
output: node tree
cps.py: convert to continuation-passing style, where each continuation is represented by a 'register' assignment.
input: node tree
output: flattened list of insns
backend.py: emit C code implementing the cps insns.
input: insns
output: a C file

The Scheme/Irken Code

lib/*.scm

These are considered 'official' library implementations of runtime features.

core.scm: candidates for compiler builtins (i.e., no need to (include "xxx.scm"))
frb.scm: functional red-black trees. may be used for the symbol table, dicts, etc. (based on Okasaki)
io.scm: core unix i/o
pair.scm: lisp lists
random.scm: random(3)
rbtree.scm: red-black tree code from scheme48
string.scm: string support
symbol: lisp/scheme symbols
vector: lisp/scheme vectors (i.e., python lists)

tests/*.scm

Currently, these are ad-hoc tests used to verify the compiler features as I write them. Eventually, these will turn into automated tests (like the ones in compile.py).

vm/*.scm

will hold the VM for the python-like language.

Quick usage notes.


$ irken tests/tak20.scm
 - compile tests/tak20.scm to
           tests/tak20.c and then
           tests/tak20

Usage: irken <irken-src-file> [options]
 -c : don't compile .c file
 -v : verbose (very!) output
 -t : generate trace-printing code
 -f : set CFLAGS for C compiler
 -m : debug macro expansion
 -O : tell gcc to optimize
 -p : generate profile-printing code

Currently, the compiler assumes that its 'lib' directory and support files are in the current directory (i.e., it assumes it's being run where it was built). This will be fixed eventually with a proper install in /usr/local/.

Last modified: Tue Nov 3 16:41:30 PST 2009