For Pandas, I recommend the third-party Joblib library: https://joblib.readthedo...

paddy_m · on Sept 14, 2023

That looks a bit low level. I would look at dask and polars. Dask scales to multiple processes on a single machine and to multiple machines and its dataframe looks pretty close pandas. Polars uses multiple cores on the same machine better than pandas (not sure about dask), but has a significantly different dataframe api than pandas. Polars, primarily through lazyframes enables much higher single core performance too.

nerdponx · on Sept 14, 2023

Yeah, it's "low level" in the sense that you still have to manually chunk up your data. I agree that Dask, Polars, etc are better if you want a more transparent distributed computing experience. Joblib is great for if you already have working single-process code and you just want to parallelize it. It's what Scikit Learn uses internally, for example.

But as it pertains to the original thread topic, it's still fairly high-level. I'd consider it bit higher-level than concurrent.futures for example.