BREAKING
9h agoAmazon Earnings, Trainium and Commodity Markets, Additional Amazon Notes///9h agoWomen sue the men who used their Instagram feed to create AI porn influencers///9h agoFast16 Malware///9h agoAmazon Earnings, Trainium and Commodity Markets, Additional Amazon Notes///9h agoWomen sue the men who used their Instagram feed to create AI porn influencers///9h agoFast16 Malware///
BACK TO GLOSSARY
MDLModelsWar

Claude Sonnet 4.6

6 mentions across all digests

Claude Sonnet 4.6 is an Anthropic model featuring a 1-million-token context window that set new SWE-Bench and OS World benchmark records, serving as the default model for Free and Pro tiers with strong coding, instruction-following, and agentic task performance.

/// Stats
First Seen2026-03-24
Last Seen2026-04-28
Total Mentions6
Subject Mentions1
Last 7 Days1
Sources6
Peak Relevance5/5
Active Predictions0
/// Recent Stories
2026-03-20HIGH

Last Week in AI #336 - Sonnet 4.6, Gemini 3.1 Pro, Anthropic vs Pentagon

Claude's Sonnet 4.6 debuts as the free/pro default with 1M context and SWE-Bench wins, but Gemini 3.1 Pro edges ahead on frontier evals (77% ARC-AGI vs Opus's 69%), while Anthropic faces Pentagon pressure over refusing fully autonomous lethal weapons deployment.

2026-04-28HIGH

Talkie: a 13B vintage language model from 1930

Researchers trained a 13B language model exclusively on pre-1931 text to investigate how historical data shapes model knowledge and temporal prediction capability, with a Claude Sonnet-powered demo.

2026-04-18HIGH

Cloudflare can remember it for you wholesale

Cloudflare's Agent Memory service lets AI agents offload conversation context, recovering the 10-20% of token space currently wasted on system prompts and tools, enabling more efficient use of limited context windows.

2026-04-16HIGH

The Boy That Cried Mythos: Verification is Collapsing Trust in Anthropic

Anthropic's Claude Mythos security verification overstates results: the flagship Firefox demo tested patched containers with pre-discovered bugs, and real code-execution rates collapse from 72.4% to 4.4% when key exploitable vulnerabilities are removed.

2026-04-06HIGH

Import AI 452: Scaling laws for cyberwar; rising tides of AI automation; and a puzzle over gDP forecasting

AI cyberattack capabilities scale exponentially—Claude Opus 4.6 achieves 50% success on expert-level tasks with performance doubling every 5–7 months, while open models rapidly close the gap to proprietary systems.