Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They use a conservative GC, which I understand, as they were using it before FTL JIT, and it required minimal changes for integration with LLVM-based JIT. However, in the blog post, they mention several times that they wanted to avoid stack maps because that would require spilling pointers from registers to stack, which they say is undesirable for performance reasons.

I wonder, however, how slow register spilling really is. I will test it when I have time, but logically, it shouldn't take up much time. Under the x64 ABI, 6 registers are used for argument passing [1], and the rest of the arguments are passed on the stack. So, when the runtime calls into GC functions, all but at most 6 pointers are already in the stack, at (in theory) predictable locations. Those 6 registers can be pushed to stack in 6 instructions that take up 8 bytes [2], so the impact on the code size should be minimal, and performance is probably also much faster than most other memory accesses. Furthermore, both OCaml and Haskell use register spilling, and while not quite at C-like speeds, they are mostly faster than JS engines and probably also faster than FTL JIT.

Of course, predicting the stack map after LLVM finishes its optimisations is another thing entirely, but I sincerely hope the developers implement it. EDIT: it seems that LLVM includes some features [3] that allow one to create a stack map, though I wonder if it can be made as efficient as the GHC stack map, which is simply a bitmap/pointer in each stack frame, identifying which words in the frame are pointers and which aren't.

[1] http://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_...

[2] tested using https://defuse.ca/online-x86-assembler.htm#disassembly

[3] http://llvm.org/docs/GarbageCollection.html



Registers have to be spilled at every location where the GC can be entered. This includes places where the GC can be triggered, including function calls and loop back edges. Having to spill values held in registers across a loop back edge is quite suboptimal.


That isn't true with all stack map implementations though: mature ones (such as OCaml's IIRC) have the ability to track registers as well, eliminating the spilling requirement. But LLVM doesn't have support for this yet (although I'll be delighted if Azul lands their patches to implement it!)


Yes, I should clarify this is a limitation of LLVM's current gcroot design.


That's a half-truth. Callee-save registers are a beautiful thing. When LLVM emits code for a function call, with high probability all of the important state will not be spilled - it'll be left in callee-save registers and it will be up to the callee to save those registers if the callee wishes to use them.

All that the GC has to do is force-spill all callee saves onto the stack, or just save them on the side. A typical conservative-on-the-stack (the more general class of collectors to which WebKit's Bartlett-based GC belongs) algorithm for stack scanning looks like:

void scanConservatively(void* begin, void* end); void scanMyStack() { int fake; void* start = &fake; void* end = ... get the address of where you first entered the VM ... jmp_buf buf; setjmp(&buf); scanConservatively(start, end); scanConservatively(&buf, &buf + 1); }

Notice that this uses setjmp() as a hack to extract all callee-save registers.

That is sufficient to: (1) get all of the pointers you're interested in and (2) allow the compiler (in WebKit's case, LLVM) maximum freedom to not spill registers anymore than it would do in an normal C calling convention where no GC is in play.

Bottom line: Bartlett = zero overhead stack accounting.


My post was talking about LLVM's gcroot mechanism, which forces spills anywhere GC could be invoked. Sorry if that was unclear.


For a similar project I was on -- that is a managed language built on top of LLVM -- we ensured GC events only came from calls to GC functions, either explicit collects or allocates. Spills to the stack of references only occurred in calls to functions and not loops. Our collector was an accurate collector though.

This project was targeting game code for the register heavy, branch-unfriendly, in-order PowerPC PS3 and Xbox 360. Performance in scalar numerical code was equal to equivalent C++ code.


If I recall correctly, GHC (at least the non-LLVM code generator) doesn't spill at every location that GC could be entered. Instead, it checks whether GC needs to be entered, by checking if there is any more space on the stack & heap (admittedly, this is quite trivial, due to its bump-pointer allocation), and only spills and enters the GC if necessary.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: