Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using KV in the caching context is a bit confusing because it usually means key-value in the storage sense of the word (like Redis), but for LLMs, it means the key and value tensors. So IIUC, the cache will store the results of the K and V matrix multiplications for a given prompt and the only computation that needs to be done is the Q and attention calculations.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: