Research

How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling

Systematic benchmarking reveals LLMs still lag behind human experts on complex mathematical modeling tasks requiring multi-stage reasoning.

Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline

Researchers conduct a systematic evaluation comparing large language models against human experts on Mathematical Contest in Modeling (MCM) problems. The study assesses LLM performance across diverse mathematical modeling scenarios requiring complex reasoning and multi-stage problem-solving. This benchmarking provides insights into current LLM capabilities and limitations in specialized mathematical domains.

Read original at arXiv CS.CL (Computation & Language)

Google turns Chrome into an AI coworker for the workplace

Google embeds Gemini-powered agents into Chrome to automate enterprise workplace tasks like CRM data entry and meeting scheduling, bringing AI task execution to desktop workflows.