BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Safety

Water company wasted $200k on bad answers from an AI model – so built its own slop filtering system

Waterline spent $200k learning that frontier LLMs hallucinate materials science, so they built Rozum — a deterministic ensemble system catching 76% of model-fabricated claims in high-stakes research.

Thursday, March 19, 2026 12:00 PM UTC2 MIN READSOURCE: The RegisterBY sys://pipeline

Waterline Development lost $200k and 4 months after LLMs (Grok, ChatGPT) confidently hallucinated materials science guidance, leading them to build Rozum — a multi-model orchestration system that runs ensemble models in parallel with a deterministic verification layer. Rozum flags unsupported claims in 76% of frontier model responses and outperforms GPT-4, Grok 4, and Gemini 3.1 Pro on Humanity's Last Exam benchmarks. Aimed at high-stakes research decisions rather than real-time use, it's a concrete example of production-grade hallucination mitigation via model ensembling and deterministic tool grounding (e.g. RDKit).

Tags
safety