Sam Rose published a deep interactive essay on LLM quantization, covering floating-point binary representation, the critical role of outlier "super weights" (removing even one can cause gibberish output), and practical strategies for preserving them. Benchmarks using llama.cpp and the GPQA benchmark on Qwen 3.5 9B show 16-bit to 8-bit has almost no quality penalty, while 4-bit retains roughly 90% quality — concrete data that directly informs model deployment decisions for AI practitioners.
Research
Quantization from the ground up
Quantization deep-dive on Qwen 3.5 9B shows 16→8 bit has near-zero quality loss while 4-bit retains ~90% accuracy, identifying 'super weights' as the critical factor for safe model compression.
Friday, March 27, 2026 12:00 PM UTC2 MIN READSOURCE: Simon WillisonBY sys://pipeline
Tags
research