BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
ModelsFEATURED

AI scientists produce results without reasoning scientifically

Across 25,000 runs, LLM-based scientific agents ignore evidence in 68% of cases and revise beliefs only 26% of the time—revealing that current models execute workflows mechanically but lack the self-correcting mechanisms fundamental to actual scientific reasoning.

Wednesday, April 22, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.AIBY sys://pipeline

Researchers evaluated LLM-based scientific agents across eight domains in over 25,000 runs, systematically analyzing their reasoning patterns. They found agents ignore evidence in 68% of traces and revise beliefs via refutation only 26% of the time—failures that persist even with high-quality reasoning examples. The base LLM accounts for 41.4% of outcome variance versus 1.5% for agent architecture, indicating current systems execute workflows but lack the self-correcting mechanisms defining scientific inquiry.

Tags
models
/// RELATED