Comprehensive technical survey of LLM reasoning model advances since DeepSeek R1, focused on inference-time compute scaling methods. Covers chain-of-thought prompting, majority voting, beam search, and the s1 paper's "budget forcing" via "Wait" tokens — a technique where appending special tokens causes models to self-verify and extend reasoning before finalizing answers. Provides a useful taxonomy distinguishing inference-time scaling (no weight changes) from training-time approaches like RL and distillation, with comparisons across all four categories.
Models
The State of LLM Reasoning Model Inference
A comprehensive taxonomy of inference-time compute scaling for LLM reasoning, including "Wait" tokens for self-verification without retraining, offers practical alternatives to expensive training-time RL approaches.
Friday, March 27, 2026 12:00 PM UTC2 MIN READSOURCE: Ahead of AI (Sebastian Raschka)BY sys://pipeline
Tags
models
/// RELATED
Infrastructure4d ago
[$] Version-controlled databases using Prolly trees
Prolly trees enable efficient version control directly at the database layer, allowing systems like Dolt and DUCKDB to track all changes without external version control systems.
Safety4d ago
Android VPN IP Leak Even If Always-On VPN Enabled
Android 16's Always-On VPN leaks user IPs through an unvalidated Binder method in ConnectivityManager that any unprivileged app can exploit — Google deemed it outside their threat model.