Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're right that a C extension could be implemented in a way that breaks the usual Python C extension contract and then have undefined behaviour. For example, if you release the GIL and then iterate over a Python list then another thread could modify that list (causing the underlying buffer to point at another location) causing undefined behaviour.

But this is a case of undefined behaviour even for C extensions that do obey the usual contract. It even includes numpy for example. The two relevant elements are:

* When C extension uses the buffer protocol, is should operate directly on the buffer - the whole point of the buffer protocol is to avoid copying and allow speed almost of native code.

* When C extension is is doing CPU intensive work, release the Python GIL (but don't use any CPython APIs e.g. to instantiate a new object, iterate a list, etc.)

A major selling point of numpy, for example, is that you can use it to do CPU intensive work without the GIL locked, allowing true parallelism. I don't think it's a deal breaker that you can use this to get proper C-level undefined behaviour but it's definitely interesting, and I don't consider it a bug in numpy or Python.



I’ve never used buffers, but my understanding is that you can modify the contents directly from python without even needing a C extension. Surely I could trigger UB by just scribbling into the data structure for one of python’s native types and then accessing it normally.


I don't think that's true - you can use the buffer protcol to give one object a view into another but those objects' classes need to be written in C in order to allow that exchange to happen.

But if even if you were able to modify a buffer-protocol buffer directly using native Python, that would keep the GIL locked, which would prevent simultaneous access from multiple threads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: