Research examines how language models develop reasoning through fine-tuning and reinforcement learning using chess. Multi-move trajectory training produces faithful reasoning; RL reduces hallucination rates and improves move quality. Authors release checkpoints and code with a 7B model surpassing open-source baselines.
Research
Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning
Chess-based trajectory training combined with reinforcement learning enables smaller 7B language models to develop faithful reasoning and reduce hallucinations, beating open-source baselines.
Wednesday, April 8, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline
Tags
research