Streaming experts is a technique that runs massive Mixture-of-Experts models on consumer hardware by streaming only the needed expert weights from SSD per token, bypassing full-model RAM requirements. A 1 trillion parameter Kimi K2.5 model (32B active weights) now runs on an M2 Max MacBook Pro in 96GB RAM, and Qwen3.5-397B-A17B runs on an iPhone — albeit at 0.6 tok/s. The technique appears to be improving rapidly through community-driven autoresearch optimization loops.
Models
Streaming experts
Streaming expert weights from SSD per token lets trillion-parameter Mixture-of-Experts models like Kimi K2.5 run on M2 Max (96GB RAM) and Qwen3.5-397B on iPhone through rapid community-driven optimization.
Tuesday, March 24, 2026 12:00 PM UTC2 MIN READSOURCE: Simon WillisonBY sys://pipeline
Tags
models
/// RELATED