Just out of interest for people not familiar with C, latest goings on there etc, what's the usecase where you'd want a garbage collector in C?
My (very surface level) understanding was always the trade off for the increased manual effort of using C - manual memory management being one example - was that you could tailor your solution exactly to your usecase for increased performance / lower resource use.
If you're going for a garbage collector, why not also benefit from some of the increased language power/features of a higher level language?
Typically, new garbage collected languages use the Boehm garbage collector (GC for C) in early versions, and then implement their own once they have time to fully optimize their runtime.
This is what Mono and Golang did. (They both used Boehm until they had time and resources to implement their own runtime-optimized GC.) I suspect Java did too, but I'm not sure.
In this case: "The original motivation for gc is my desire to write my own LISP in C, entirely from scratch - and that required garbage collection."
What basically happens is that gc (and Boehm) are "conservative garbage collectors." They treat all values in a data structure as a potential pointer, because they don't know the contents of the data structure. It's a good "quick and dirty" way to have garbage collection if you can accept the risk that some of your memory will remain uncollected if some of your data happens to have the same value as one of your pointers. In practice, it's a good tradeoff.
(Other tradeoffs are that you can't have things like real-time garbage collection, generational garbage collection, or compacting garbage collection.)
> If you're going for a garbage collector, why not also benefit from some of the increased language power/features of a higher level language?
Ironically, one of Boehm's use cases is looking for memory leaks. You basically #ifdef Boehm into a test build, and if a GC finds garbage, you know that you didn't free something correctly.
> Ironically, one of Boehm's use cases is looking for memory leaks. You basically #ifdef Boehm into a test build, and if a GC finds garbage, you know that you didn't free something correctly.
So a "garbage checker"? :) This is interesting, I never really thought about using it as a checking tool.
Oddly enough, Unity a very popular game development platform ended up sticking with an early version of Mono and Boehm for a very long time (years) and I'm not even sure that it's fully transitioned to something newer yet. (anybody know?)
Think about it the other way around — Java, Python, JavaScript all have garbage collectors. What language are those GCs written in? How does the runtime keep track of the objects in those languages, and how is the lifecycle for that tracking managed?
"Writing a GC in C" (or equivalent language) is not so much a strange thing as it is an inevitability.
Some gcs are indeed written in C, often due to convenience of tooling. But Java's zgc is written in Java. Pypy's gc is written in Python. Go's gc is written in Go.
It’s really not. If you’re implementing a language runtime for language X, and the runtime is written in C, then writing the X GC actually means “writing a GC in C, for the subset of C allocations that represent X allocations.”
E.g. Python objects are represented as PyObject values in CPython. Saying that Python is garbage collected is equivalent to saying PyObjects are garbage collected. Sure enough, PyObject contains the ref count necessary for its GC.
I have to second GP, writing a GC for C is strange. Not because it is written in C, but for C. State of the art efficient and low latency GC pose requirements on the memory layout of allocated objects and often on the code generated to access those objects, i.e. memory fences of different kind. Some GCs like GHC's or some JVM GCs have special features integrated into the programming language semantics. Since, the semantics of C are basically fixed, the GC misses out on potential improvements and is only ever applicable to a strict subset of C programs.
There's a subtlety here that I think you're missing.
The Hotspot JVM is a C++ program, CPython is a C program that implements the Python language. Both CPython and Hotspot mostly manage their respective memory in the usual styles of their host languages, but they still need some sort of host language representation for their target language objects.
AIUI, this is less obvious with the JVM (where the runtime manages to avoid a lot of host language allocations), but CPython has an explicit PyObject type, and every target language allocation corresponds to a host level allocation in some way. Because the lifecycle of these allocations in C is inextricably linked to the Python-level objects' lifecycle, you don't explicitly allocate/deallocate those manually on an individual level, you let the GC (Refcount in this case) handle that instead. It just turns out that "letting the GC do it" is really just "letting some other logic in my program do it". This means that implementing the Python GC corresponds exactly to implementing GC for PyObject objects at the C level.
The usecase for this project (according to the README) is:
> The original motivation for gc is my desire to write my own LISP in C, entirely from scratch - and that required garbage collection.
Another reason would be in platforms where you cannot run the language of your choice (because lack of implementations of JVM / Python / whatever), although those would maybe not allow malloc either.
One thing that wasn't mentioned so far is that you may want to use garbage collection only for a portion of your program. You can mix manual malloc/free memory management (plus RAII-style automatic lifetimes) together with GC'd allocation in some parts of your program where you think the tradeoffs make sense.
I could imagine something like this being applicable towards large legacy codebases with a history of memory leaks and segfaults. With legacy codebases you're often only able to tread water and won't be in a position to track everything to it's root cause.
You've answered your own question. For some part of the system you do one type of memory management, for other part you use the GC. If you use a higher level language for everything then writing the performant part will be tricky.
My (very surface level) understanding was always the trade off for the increased manual effort of using C - manual memory management being one example - was that you could tailor your solution exactly to your usecase for increased performance / lower resource use.
If you're going for a garbage collector, why not also benefit from some of the increased language power/features of a higher level language?