BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Research

Agentic Frameworks for Reasoning Tasks: An Empirical Study

22 agentic frameworks plateaued at 74-76% accuracy on reasoning benchmarks, with failures driven by orchestration bottlenecks (context bloat, cost explosions, API quota limits) rather than reasoning capability gaps.

Tuesday, April 21, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.AIBY sys://pipeline

Empirical study evaluates 22 agentic frameworks across three reasoning benchmarks (BBH, GSM8K, ARC). 12 of 19 frameworks that completed all tasks achieved stable 74.6-75.9% accuracy. Key finding: failures were orchestration-driven (context growth, cost explosions, API quota exhaustion), not reasoning capability limits.

Tags
research