Lyptus Research applies METR's time-horizon methodology to offensive cybersecurity across 7 benchmarks, finding AI capability is doubling every 9.8 months overall — and 5.7 months since 2024. The most capable frontier models (GPT-5.3 Codex and Opus 4.6) now match human expert performance on 3+ hour offensive security tasks, and the researchers believe even these estimates understate recent progress due to fixed evaluation budgets.
Models
Offensive Cybersecurity Time Horizons
AI capability in offensive cybersecurity is doubling every 5.7 months since 2024, with Opus 4.6 and GPT-5.3 Codex now matching human expert performance on multi-hour hacking tasks.
Friday, April 3, 2026 12:00 PM UTC2 MIN READSOURCE: LobstersBY sys://pipeline
Tags
models