N-Day-Bench is a new benchmark from Winfunc Research measuring how well frontier LLMs can discover real vulnerabilities in code using vulnerabilities disclosed after model training cutoffs. The standardized harness prevents reward hacking and updates monthly with latest model versions. All results are publicly available for transparent capability tracking.
Research
N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?
Winfunc Research's N-Day-Bench standardizes LLM security benchmarking using post-training-cutoff vulnerabilities to measure code vulnerability discovery capabilities without reward hacking.
Tuesday, April 14, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline
Tags
research