df = ( pl.scan_parquet('hf://datasets/minimaxir/mtg-embeddings/mtg_embeddings.parquet') .filter( pl.col("type").str.contains("Sorcery"), pl.col("manaCost").str.contains("B"), ) .collect()
Polars is awesome to use, would highly recommend. Single node it is excellent at saturating CPUs, if you need to distribute the work put it in a Ray Actor with some POLARS_MAX_THREADS applied depending on how much it saturates a single node.
Polars is awesome to use, would highly recommend. Single node it is excellent at saturating CPUs, if you need to distribute the work put it in a Ray Actor with some POLARS_MAX_THREADS applied depending on how much it saturates a single node.