Cursor describes "real-time RL," a technique that uses production inference tokens from its Composer agentic coding model as training signal, with user responses aggregated as reward. This closes the train-test mismatch by replacing simulated users with real ones, enabling checkpoint deploys as frequently as every five hours. The approach was first validated on Tab and is now applied to Composer's Auto mode.
ModelsFEATURED
Improving Composer through real-time RL
Cursor trains Composer using real-time RL on production inference tokens, replacing simulation with actual user feedback to enable model updates every five hours.
Saturday, March 28, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline
Tags
models