Love your work, but pairwise scoring also skips the bigger group context—benchmarks vs list‑wise or MMR methods would highlight trade‑offs. And I’m curious what the compute and latency hit looks like when you run all those pair comparisons in production.