Can someone share insight into what was technically done to enable this? What replaced the global lock? Is the GC stopping all threads during collection or an other locking mechanism?
The most interesting idea in my opinion is biased reference counting [0].
An oversimplified explanation (and maybe wrong) of it goes like this:
problem:
- each object needs a reference counter, because of how memory management in Python works
- we cannot modify ref counters concurrently because it will lead to incorrect results
- we cannot make each ref counter atomic because atomic operations have too large performance overhead
therefore, we need GIL.
Solution, proposed in [0]:
- let's have two ref counters for each object, one is normal, another one is atomic
- normal ref counter counts references created from the same thread where the object was originally created, atomic counts references from other threads
- because of an empirical observation that objects are mostly accessed from the same thread that created them, it allows us to avoid paying atomic operations penalty most of the time
Anyway, that's what I understood from the articles/papers. See my other comment [1] for the links to write-ups by people who actually know what they're talking about.
AFAIK the initial prototype called nogil was developed by a person named Sam Gross who also wrote a detailed article [0] about it.
He also had a meeting with Python core. Notes from this meeting [1] by Łukasz Langa provide more high-level overview, so I think that they are a good starting point.
The key enabling tech is thread safe reference counting. There are many other problems that Sam Gross solved in order to make it happen but the reference counting was one of the major blockers.