TurboQuant, explained simply
1:28:38–1:29:17 · 39s
Luke breaks down Google’s TurboQuant: compressing the LLM KV cache via polar coordinates and one-bit error correction to slash memory use with no retraining.
1:28:38–1:29:17 · 39s
Luke breaks down Google’s TurboQuant: compressing the LLM KV cache via polar coordinates and one-bit error correction to slash memory use with no retraining.
We use cookies to understand how you use our platform and to improve your experience. Click "Accept All" to consent, or "Decline non-essential" to opt out of non-essential cookies. Read our Privacy Policy.