Research paper investigating how multilingual prompt localization affects agent-as-a-judge evaluation systems, examining both language sensitivity and model backbone variations.
Research
Multilingual Prompt Localization for Agent-as-a-Judge: Language and Backbone Sensitivity in Requirement-Level Evaluation
Multilingual prompt localization introduces significant language and model-dependent variance in agent-as-a-judge evaluation systems, potentially undermining cross-lingual assessment reliability.
Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline
Tags
research
/// RELATED