Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you want to try it out. Can lazily load from HF and apply filtering this way.

  df = (
    pl.scan_parquet('hf://datasets/minimaxir/mtg-embeddings/mtg_embeddings.parquet')
    .filter(
        pl.col("type").str.contains("Sorcery"),
        pl.col("manaCost").str.contains("B"),
    )
    .collect()
)

Polars is awesome to use, would highly recommend. Single node it is excellent at saturating CPUs, if you need to distribute the work put it in a Ray Actor with some POLARS_MAX_THREADS applied depending on how much it saturates a single node.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: