BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Infrastructure

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

Hand-written CUDA kernels and speculative decoding achieve 207 tok/s for Qwen3.5-27B on consumer RTX 3090, proving open-source optimization can match commercial inference systems on commodity hardware.

Monday, April 20, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline

Open-source CUDA kernel optimization project achieving 207 tok/s for Qwen3.5-27B on RTX 3090 through hand-written kernels, speculative decoding, and quantization, with megakernel implementation for small models.

Tags
infrastructure
/// RELATED