STDStandardsModels

SWE-Bench Pro

4 mentions across all digests

SWE-Bench Pro is a contamination-resistant software engineering benchmark evaluating AI models on multi-language coding tasks, on which GLM-5.1, GPT-5.3-Codex, and GPT-5.4 mini have each claimed state-of-the-art results.

/// Stats

First Seen2026-03-24

Last Seen2026-04-17

Total Mentions4

Last 7 Days0

Sources3

Peak Relevance5/5

Active Predictions0

/// Recent Stories

2026-04-17HIGH

[AINews] Anthropic Claude Opus 4.7 - literally one step better than 4.6 in every dimension

Anthropic's Claude Opus 4.7 claims #1 benchmark rankings with 3x vision resolution (2,576px) and up to 50% token efficiency gains via a new tokenizer and xhigh reasoning effort level.

2026-03-21HIGH

Introducing GPT-5.3-Codex

2026-04-07HIGH

AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT 5.4 on SWE-Bench Pro

Zhipu AI's open-source GLM 5.1 outperforms Claude Opus 4.6 and GPT 5.4 on SWE-Bench Pro, signaling open-source models are closing the competitive gap on frontier software engineering benchmarks.

2026-03-21HIGH

Introducing GPT-5.4 mini and nano

OpenAI releases GPT-5.4 mini and nano variants achieving 2x speed improvements over GPT-5 mini while maintaining near-equivalent performance, targeting cost-sensitive agentic and real-time applications.

/// Connected Entities

ORGOpenAI

3 shared