Just out of interest for people not familiar with C, latest goings on there etc,...

gwbas1c · on Dec 20, 2019

Typically, new garbage collected languages use the Boehm garbage collector (GC for C) in early versions, and then implement their own once they have time to fully optimize their runtime.

This is what Mono and Golang did. (They both used Boehm until they had time and resources to implement their own runtime-optimized GC.) I suspect Java did too, but I'm not sure.

In this case: "The original motivation for gc is my desire to write my own LISP in C, entirely from scratch - and that required garbage collection."

What basically happens is that gc (and Boehm) are "conservative garbage collectors." They treat all values in a data structure as a potential pointer, because they don't know the contents of the data structure. It's a good "quick and dirty" way to have garbage collection if you can accept the risk that some of your memory will remain uncollected if some of your data happens to have the same value as one of your pointers. In practice, it's a good tradeoff.

(Other tradeoffs are that you can't have things like real-time garbage collection, generational garbage collection, or compacting garbage collection.)

> If you're going for a garbage collector, why not also benefit from some of the increased language power/features of a higher level language?

Ironically, one of Boehm's use cases is looking for memory leaks. You basically #ifdef Boehm into a test build, and if a GC finds garbage, you know that you didn't free something correctly.

weberc2 · on Dec 20, 2019

> Ironically, one of Boehm's use cases is looking for memory leaks. You basically #ifdef Boehm into a test build, and if a GC finds garbage, you know that you didn't free something correctly.

So a "garbage checker"? :) This is interesting, I never really thought about using it as a checking tool.

djmips · on Dec 20, 2019

Oddly enough, Unity a very popular game development platform ended up sticking with an early version of Mono and Boehm for a very long time (years) and I'm not even sure that it's fully transitioned to something newer yet. (anybody know?)

pdpi · on Dec 20, 2019

Think about it the other way around — Java, Python, JavaScript all have garbage collectors. What language are those GCs written in? How does the runtime keep track of the objects in those languages, and how is the lifecycle for that tracking managed?

"Writing a GC in C" (or equivalent language) is not so much a strange thing as it is an inevitability.

mypalmike · on Dec 20, 2019

Some gcs are indeed written in C, often due to convenience of tooling. But Java's zgc is written in Java. Pypy's gc is written in Python. Go's gc is written in Go.

jeen02 · on Dec 20, 2019

Writing a GC for C is pretty strange.

pdpi · on Dec 20, 2019

It’s really not. If you’re implementing a language runtime for language X, and the runtime is written in C, then writing the X GC actually means “writing a GC in C, for the subset of C allocations that represent X allocations.”

E.g. Python objects are represented as PyObject values in CPython. Saying that Python is garbage collected is equivalent to saying PyObjects are garbage collected. Sure enough, PyObject contains the ref count necessary for its GC.

funcDropShadow · on Dec 20, 2019

I have to second GP, writing a GC for C is strange. Not because it is written in C, but for C. State of the art efficient and low latency GC pose requirements on the memory layout of allocated objects and often on the code generated to access those objects, i.e. memory fences of different kind. Some GCs like GHC's or some JVM GCs have special features integrated into the programming language semantics. Since, the semantics of C are basically fixed, the GC misses out on potential improvements and is only ever applicable to a strict subset of C programs.

pdpi · on Dec 20, 2019

There's a subtlety here that I think you're missing.

The Hotspot JVM is a C++ program, CPython is a C program that implements the Python language. Both CPython and Hotspot mostly manage their respective memory in the usual styles of their host languages, but they still need some sort of host language representation for their target language objects.

AIUI, this is less obvious with the JVM (where the runtime manages to avoid a lot of host language allocations), but CPython has an explicit PyObject type, and every target language allocation corresponds to a host level allocation in some way. Because the lifecycle of these allocations in C is inextricably linked to the Python-level objects' lifecycle, you don't explicitly allocate/deallocate those manually on an individual level, you let the GC (Refcount in this case) handle that instead. It just turns out that "letting the GC do it" is really just "letting some other logic in my program do it". This means that implementing the Python GC corresponds exactly to implementing GC for PyObject objects at the C level.

Koshkin · on Dec 20, 2019

In the world of C nothing is strange. (Which is part of why it is future-proof.)

antoinealb · on Dec 20, 2019

The usecase for this project (according to the README) is:

> The original motivation for gc is my desire to write my own LISP in C, entirely from scratch - and that required garbage collection.

Another reason would be in platforms where you cannot run the language of your choice (because lack of implementations of JVM / Python / whatever), although those would maybe not allow malloc either.

andrepd · on Dec 20, 2019

One thing that wasn't mentioned so far is that you may want to use garbage collection only for a portion of your program. You can mix manual malloc/free memory management (plus RAII-style automatic lifetimes) together with GC'd allocation in some parts of your program where you think the tradeoffs make sense.

tjpnz · on Dec 20, 2019

I could imagine something like this being applicable towards large legacy codebases with a history of memory leaks and segfaults. With legacy codebases you're often only able to tread water and won't be in a position to track everything to it's root cause.

pdpi · on Dec 20, 2019

Adding a GC to such a code base is a terribly complex endeavour and would only make things even worse though.

de_watcher · on Dec 20, 2019

You've answered your own question. For some part of the system you do one type of memory management, for other part you use the GC. If you use a higher level language for everything then writing the performant part will be tricky.

devinalvaro · on Dec 20, 2019

As the author mentioned in the README, it's going to be used for the LISP interpreter he/she is writing from scratch.