That looks a bit low level. I would look at dask and polars. Dask scales to multiple processes on a single machine and to multiple machines and its dataframe looks pretty close pandas. Polars uses multiple cores on the same machine better than pandas (not sure about dask), but has a significantly different dataframe api than pandas. Polars, primarily through lazyframes enables much higher single core performance too.
Yeah, it's "low level" in the sense that you still have to manually chunk up your data. I agree that Dask, Polars, etc are better if you want a more transparent distributed computing experience. Joblib is great for if you already have working single-process code and you just want to parallelize it. It's what Scikit Learn uses internally, for example.
But as it pertains to the original thread topic, it's still fairly high-level. I'd consider it bit higher-level than concurrent.futures for example.