Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can somebody please explain or give me some links how FP is supposed to work without a GC? For example Rust has different types of function pointers (Fn, FnMut, FnOnce), to guarantee the possibility of lifetime analysis (so it is arguable to consider it a functional programming language). On the other hand the most common FP languages (OCaml, Haskell) all come with a GC.

Or am I wrong in assuming this is a functional language?



Lisp is not necessarily functional. There's a lot of mutation in Common Lisp even if the community favors a functional style.

I think the question is: how does Lisp function without garbage collecting cons cells?

For one, I'm not sure they even rely on cons cells like "real" Lisp. Clojure doesn't either. They cite ML as inspiration.

Carp language guide states object lifetimes are statically inferred [1] which my guess is they allocate on stack (or malloc/free by scope) and detect use-after-free at compile time.

Another, more theoretical approach is using linear types which require all values to be "consumed" [2]

1: https://github.com/carp-lang/Carp/blob/master/docs/LanguageG...

2: https://www.cs.utexas.edu/users/hunt/research/hash-cons/hash...


Thanks for the links. There's also this Reddit discussion [0] from 2 years ago (it mentions Carp btw.) and an explanation about how Carp manages memory [1].

[0] https://www.reddit.com/r/haskell/comments/d5d13i/is_it_possi...

[1] https://github.com/carp-lang/Carp/blob/master/docs/Memory.md


In particular, the reddit discussion mentions ASAP, which is a set of static analysis algorithms that can work with mutability, generics (polymorphism) and linear types. http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-908.pdf

As far as I'm aware, this really only applies to single threaded systems. However, if you implement threadsafe modules and keep the shared stuff internal (use the ASAP algos here), you can fit the majority of use-cases including hard-realtime constraints.

Composita is the module system I'm referring to. http://www.composita.net/Composita.html It allows concurrency with managed memory and no GC, through static analysis of the module (component) interface queues.


You're correct, somewhat like clojure's Vector, the default collection at runtime in Carp is an Array which is heap allocated C array with an attached length. At compile time however the dynamic language uses more standard lispy lists.


Me knows:

"TN-tools in Nokia were automatically compiled into C-code to be run in VAX-computers. Compiled C-code did not have garbage collector, there was separate reclaim-command for discarding used data. If you managed to run your program without ever hearing the BEEP caused by garbage collector, your program was ready for VAXing."

https://timonoko.github.io/Nokolisp.htm


In general, most lisps are imperative, or support multiple paradigms including imperative.

First-class functions are a necessary feature of FP, but the mere presence of the feature does not make a language functional. Some counterexamples include Fortran and Smalltalk.

It's an easy misconception because most the well-known newer entries to the lisp family - Scheme, Racket, and Clojure - are all mostly functional, and because most of the major non-functional dialects of lisp died out 30 or 40 years ago.


As long as you forbid or mark cycles (…or ignore the problem) lifetime analysis can be done statically. Which is better anyway. Automatic memory management doesn't need a GC.


It's really surprising that it's rare in functional languages. Immutability seems like it should guarantee no cycles (?), so reference counting could be used.


Why would immutability guarantee no cycles? Here is a line of valid haskell:

        star e = let (sp, a) = (Split a e, atom sp) in sp
EDIT: I guess I should probably explain it: star is a function that takes an expression "e" and returns the value "sp" which is "Split a e" where "a" is the results of calling the "atom" function on "sp". This is creating a representation of a regex star operator. Note that the tuple defined in the let definition is only to define a name for the two values of the tuple so that they can refer to each other.


I mean generally tying the knot is a useful technique, but I think these scenarios all require/exploit non-strict, which is in itself not really immutable in the sense most people use it.

But yes, such code is often useful so that e.g. a parent xml node can refer to its childen nodes while also children nodes can refer to their parents.

Anyway, I'm not sure about this, but I think you can't have circular data structures in the context of strict evaluation (or can you? maybe by defering execution via anonymous functions? I wonder....)


I'm fairly certain I've done similar things in Ocaml (in fact, I think it's where I learned this technique).


The big thorn is closures, which show up all over the place in FP. You either need to limit closures vs ordinary functions (as e.g. Rust does), have manual memory annotations (such as e.g. what Swift does) or you basically need a GC. The former two choices are annoying if you really take advantage of functions as first-class citizens.


Yes, I thought that might be a problem... I guess Rust is the choice for me.


Reference counting is usually very expensive, because even reading a variable updates the reference count, and ending a scope involves testing the reference count of every variable defined inside the scope and conditionally deallocating the referent.

Without reference counting, here's the end of a hairy function scope that deallocates 15 local variables and restores two callee-saved registers:

        11a6:       48 83 c4 78             add    $0x78,%rsp
        11aa:       5b                      pop    %rbx
        11ab:       41 5e                   pop    %r14
        11ad:       c3                      retq
Now imagine looping over 15 local variables to decrement each reference count, test it, and do a conditional jump based on the result; if that's open-coded, your epilogue is humongous, and if it's in a reference-count-decrementing subroutine, it's going to have mispredicted branches all over the place, costing you maybe 15 cycles each, a total of about 100 cycles for this function. We're talking about adding an order of magnitude of cost to subroutine call, or more. (I think this function doesn't really need 15 local variables; that's the compiler's fault.)

This gets worse with multithreading, because writing to the same reference count on different cores would even in the best case require one core stealing the cache line from another in order to modify it, which may stall the core; but often even an atomic reference count increment is more expensive even than that because it involves a memory barrier.

Reference counting can become reasonably cheap if it's done at large granularity (filesystem files or COM objects, not conses); if you can elide almost all of the reference-count updates, as in Rust; or if your language runtime is just so dog-slow at everything that the extra cost of reference counting isn't that important, like CPython.

30 years ago or more, before generational GC had gone mainstream, reference counting was a more reasonable choice, because GC was going to be very slow in any case, and ref counting at least used less memory—especially important on machines without cache or virtual memory.

(Purely immutable (applicative) languages like Haskell and Miranda are usually lazy, since that's the payoff for completely abjuring side effects. But lazy evaluation is implemented at the machine level by mutation.)


WRT the multithreading issue, https://lwn.net/SubscriberLink/872869/0e62bba2db51ec7a/ mentions that using atomic memory operations in CPython for Py_INCREF and Py_DECREF slows down the entire CPython interpreter by 60%.


> Reference counting is usually very expensive, because even reading a variable updates the reference count

There are many papers out there on how to elide most ref count operations on locals, and runtimes that use ref counting for automatic memory management typically defer ref count updates in various clever ways (like Nim IIRC). You give up some latency for significant throughput gains.


Aye.


> Reference counting is usually very expensive, because even reading a variable updates the reference count, and ending a scope involves testing the reference count of every variable defined inside the scope and conditionally deallocating the referent.

This is true, but if you can prove things about lifetimes then you can eliminate RC operations with that.

Also, GC is expensive for large processes because 1. peak memory is typically higher 2. you have to scan all of memory for pointers 3. that memory may have been swapped out. Of course this can be optimized too.


> This is true, but if you can prove things about lifetimes then you can eliminate RC operations with that.

Yes, this is true, and in fact with Rust's lifetime analysis you can get excellent performance with reference counting in many places. There is more information about this in https://news.ycombinator.com/item?id=28879401 and https://news.ycombinator.com/item?id=28884775.

> Also, GC is expensive for large processes because 1. peak memory is typically higher 2. you have to scan all of memory for pointers 3. that memory may have been swapped out. Of course this can be optimized too.

It's true that a tracing GC usually requires significantly more memory than manual allocation or reference counting, though occasionally it doesn't, since to implement compacting you basically have to implement a tracing GC, so a GC can sometimes give you lower fragmentation.

It's not true that you have to scan all of memory for pointers, for two reasons:

1. You may have a region of memory that contains no pointers, like Erlang's binary storage.

2. Generational GCs only have to look for pointers in the nursery, the stack, and things marked by the write barrier. If they're GCing a generation that isn't the nursery, at that point they have to scan that generation too, but that's so much less frequent that it doesn't make GC expensive. You say, "Of course this can be optimized too," but actually almost all modern GCs are generational, since the late 01990s. Except...

3. Things like Erlang partition the heap into separate per-process heaps for a large number of lightweight processes. Each of these heaps can be GCed independently because you can't have pointers from one heap to another. "What about unreferenced processes?", you might ask. Well, Erlang doesn't GC processes; you have to explicitly create and destroy them, like a larger-granularity malloc/free, typically using a Rust-like ownership hierarchy to avoid process leaks. This works better than it sounds like it should.


> Generational GCs only have to look for pointers in the nursery, the stack, and things marked by the write barrier. If they're GCing a generation that isn't the nursery, at that point they have to scan that generation too, but that's so much less frequent that it doesn't make GC expensive. You say, "Of course this can be optimized too," but actually almost all modern GCs are generational, since the late 01990s.

I was going to come back and edit this to say "…when you violate an assumption of generational GC" but you somehow managed to instantly reply to my 3 days late post!

Though I'm not too familiar with how GC runtimes are really implemented; I thought generations were more or less based on allocation age but don't know if it actually does analysis to separate allocations that can't reference each other.


I was just lucky! I didn't mean to play gotcha!

(I should write a HN user-agent to watch old threads, though.)

I should preface this by saying I've never actually written a GC, or maintained one someone else wrote, so I might be off-base here; this is just my understanding from reading books and papers:

The awesome thing about generational GC is that (normally) the generations are physically separate in memory, so the GC can avoid even looking at objects in generations older than the one it's emptying out. (Unless they might contain pointers to objects in that generation, that is, which is detected by the write barrier.)

This combines nicely with the potential killer advantage of copying collectors: they don't need to even look at garbage. So if your nursery (or Garden of Eden, if you prefer) is 8MiB and contains only 128 live objects averaging 64 bytes each, a non-concurrent nursery GC consists of walking the stack and any marked older objects, copying those 8192 bytes into the next generation while adjusting the relevant pointers, then resetting the allocation pointer to the beginning of the nursery. The other, say, 130,944 objects that were allocated in the nursery aren't even looked at by the GC, because nothing points to them. (Unless they have finalizers (destructors).) So the GC examines and copies on average 0.06 bytes per object allocated.

Initializing the objects in the first place, as you must do whether they're allocated by decrementing a stack pointer or anything else, is far more expensive.

Of course, most programs retain a larger fraction of their nursery objects than that, stack allocation makes better use of memory caches than a multi-megabyte nursery, and if your program keeps retaining nursery objects long enough, it will eventually need to GC one of the other generations. So GC still isn't free even in the throughput sense. But it's competitive with manual dynamic allocation.


There are two ways to get cycles in Haskell.

One is through “tying the knot”. E.g. to create an infinite list of 1,1,1,1,1,…

ones = 1 : ones

This will actually be compiled to a list cell with a pointer back to itself. You can construct more complicated self-referential data structures thanks to laziness.

The other way you could get a cycle is that it actually does have mutable data structures, although their use is restricted so they can’t have any observable mutable effect in pure code. But you have e.g. IORef which is basically a mutable pointer.

If you wanted no cycles you would need to eliminate some subset of laziness, knot-tying transformations, recursion, and any support for mutable data structures. But yes, I think it could be done.


From the Carp docs:

Memory management is handled by static analysis, a value is owned by the function where it was created. When a value is returned or passed to another function the initial function will give up ownership of it and any subsequent use will lead to a compiler error. To temporarily lend a value to another function (for example to print it) a reference must be created, using the ref special form (or the & reader macro).

and

achieved through a linear type system where memory is owned by the function or let-scope that allocated it. When the scope is exited the memory is deleted, unless it was returned to its outer scope or handed off to another function (passed as an argument).


So how is memory fragmentation avoided?


That's a question for the underlying allocator, surely?

(Quite often the answer is "it isn't")


Not just for the allocator. I always thought a main point of a garbage collector was heap compactification (shuffling things around so there is more space), but maybe I am wrong.


Nah, only copying / generational collectors do heap compactification. A simple Mark&Sweep collector doesn't, for example. Nor does reference counting. Both of which are used by many Lisp or Lisp-like languages.

Nothing can substitute for a really good allocator.


So what happens to long running lisp programs? Running out of memory due to fragmentation eventually?


One would expect at least a provision for a stop the world phase. where all mutator threads get to wait for a massive defragmentation.


Not every GC has compaction phase though, but generational ones do by design.


& Rust is surely a highly imperative language that has absorbed functional idioms (where appropriate), rather than something that falls under the banner of "functional language"?


There's a reason why functional languages all use a GC. And that reason makes Rust not such a good language for functional idioms unless you stay within very strict lines.


use value semantics for everything is one way


Lisp is definitely not as functional as for instance Haskell is. Side-effect-freeness was never really a topic for Lispers.


You are right, Lisp is far more versatile. There is even a statically-typed language Coalton [1] embedded into Lisp.

[1] https://coalton-lang.github.io/20211010-introducing-coalton/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: