ModelsFEATURED

AI scientists produce results without reasoning scientifically

Across 25,000 runs, LLM-based scientific agents ignore evidence in 68% of cases and revise beliefs only 26% of the time—revealing that current models execute workflows mechanically but lack the self-correcting mechanisms fundamental to actual scientific reasoning.

Wednesday, April 22, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.AIBY sys://pipeline

Researchers evaluated LLM-based scientific agents across eight domains in over 25,000 runs, systematically analyzing their reasoning patterns. They found agents ignore evidence in 68% of traces and revise beliefs via refutation only 26% of the time—failures that persist even with high-quality reasoning examples. The base LLM accounts for 41.4% of outcome variance versus 1.5% for agent architecture, indicating current systems execute workflows but lack the self-correcting mechanisms defining scientific inquiry.

Read original at arXiv CS.AI

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

Robots can better handle complex, multi-step manipulation tasks by reasoning aloud through interleaved text and visual traces, combining planning and perception to improve robustness.