Research

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

SlopCodeBench benchmark reveals that coding agents systematically degrade in output quality and adherence to task intent as iterative sequences grow longer, exposing a critical failure mode in real-world agentic development workflows.

Friday, March 27, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.AIBY sys://pipeline

SlopCodeBench is a new benchmark targeting a specific and practical failure mode: coding agents producing degraded output quality over long, iterative task sequences. The benchmark systematically measures how agents "slop out" — losing coherence, correctness, or adherence to intent as tasks accumulate context and complexity. Highly relevant for anyone relying on agentic coding tools in real workflows where multi-step tasks are the norm.

Read original at arXiv CS.AI

Apple wants to kill your Time Capsule, but they run NetBSD so they can’t

Apple's removal of AFP in macOS 27 threatens legacy Time Capsule devices, but open-source projects can resurrect them by leveraging their NetBSD core to add Samba 4 support.