Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
KVarN: Native vLLM backend for KV-cache quantization by Huawei (github.com/huawei-csl)
115 points by theanonymousone 10 hours ago | past | 12 comments
Sinkhorn: Make LLMs even smaller through quantisation while maintaining accuracy (github.com/huawei-csl)
4 points by ilitirit 8 months ago | past | 1 comment

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: