Models

Streaming experts

Streaming expert weights from SSD per token lets trillion-parameter Mixture-of-Experts models like Kimi K2.5 run on M2 Max (96GB RAM) and Qwen3.5-397B on iPhone through rapid community-driven optimization.

Tuesday, March 24, 2026 12:00 PM UTC2 MIN READSOURCE: Simon WillisonBY sys://pipeline

Streaming experts is a technique that runs massive Mixture-of-Experts models on consumer hardware by streaming only the needed expert weights from SSD per token, bypassing full-model RAM requirements. A 1 trillion parameter Kimi K2.5 model (32B active weights) now runs on an M2 Max MacBook Pro in 96GB RAM, and Qwen3.5-397B-A17B runs on an iPhone — albeit at 0.6 tok/s. The technique appears to be improving rapidly through community-driven autoresearch optimization loops.

Read original at Simon Willison

The text mode lie: why modern TUIs are a nightmare for accessibility

Popular TUI frameworks like Bubble Tea and tcell break screen reader compatibility by treating terminals as 2D grids instead of sequential streams, despite widespread assumptions they're accessible to blind developers.