Researchers developed "Latent Agents," a post-training method that distills multi-agent debate into single-model inference, reducing token usage by up to 93% while maintaining reasoning performance. Through activation steering, they discovered the method creates interpretable agent-specific subspaces in model activations. The technique enables novel safety applications by allowing easier localization and control of harmful behaviors.
Models
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate
A new post-training method internalizes multi-agent debate into a single model with 93% fewer tokens, using activation steering to create interpretable agent subspaces that enable safer AI behavior control.
Thursday, April 30, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.AIBY sys://pipeline
Tags
models