They use a conservative GC, which I understand, as they were using it before FTL...

rayiner · on May 13, 2014

Registers have to be spilled at every location where the GC can be entered. This includes places where the GC can be triggered, including function calls and loop back edges. Having to spill values held in registers across a loop back edge is quite suboptimal.

pcwalton · on May 13, 2014

That isn't true with all stack map implementations though: mature ones (such as OCaml's IIRC) have the ability to track registers as well, eliminating the spilling requirement. But LLVM doesn't have support for this yet (although I'll be delighted if Azul lands their patches to implement it!)

rayiner · on May 13, 2014

Yes, I should clarify this is a limitation of LLVM's current gcroot design.

pizlonator · on May 14, 2014

That's a half-truth. Callee-save registers are a beautiful thing. When LLVM emits code for a function call, with high probability all of the important state will not be spilled - it'll be left in callee-save registers and it will be up to the callee to save those registers if the callee wishes to use them.

All that the GC has to do is force-spill all callee saves onto the stack, or just save them on the side. A typical conservative-on-the-stack (the more general class of collectors to which WebKit's Bartlett-based GC belongs) algorithm for stack scanning looks like:

void scanConservatively(void* begin, void* end); void scanMyStack() { int fake; void* start = &fake; void* end = ... get the address of where you first entered the VM ... jmp_buf buf; setjmp(&buf); scanConservatively(start, end); scanConservatively(&buf, &buf + 1); }

Notice that this uses setjmp() as a hack to extract all callee-save registers.

That is sufficient to: (1) get all of the pointers you're interested in and (2) allow the compiler (in WebKit's case, LLVM) maximum freedom to not spill registers anymore than it would do in an normal C calling convention where no GC is in play.

Bottom line: Bartlett = zero overhead stack accounting.

rayiner · on May 14, 2014

My post was talking about LLVM's gcroot mechanism, which forces spills anywhere GC could be invoked. Sorry if that was unclear.

maximilianburke · on May 14, 2014

For a similar project I was on -- that is a managed language built on top of LLVM -- we ensured GC events only came from calls to GC functions, either explicit collects or allocates. Spills to the stack of references only occurred in calls to functions and not loops. Our collector was an accurate collector though.

This project was targeting game code for the register heavy, branch-unfriendly, in-order PowerPC PS3 and Xbox 360. Performance in scalar numerical code was equal to equivalent C++ code.

tomp · on May 13, 2014

If I recall correctly, GHC (at least the non-LLVM code generator) doesn't spill at every location that GC could be entered. Instead, it checks whether GC needs to be entered, by checking if there is any more space on the stack & heap (admittedly, this is quite trivial, due to its bump-pointer allocation), and only spills and enters the GC if necessary.