RAG Extraction Pipeline for a Power Utility
A major power utility needed equipment-qualification data extracted from thousands of dense engineering documents at high accuracy. The existing manual, spreadsheet-driven process was slow and silently let errors through, unacceptable in a safety-critical engineering context.
We built a retrieval-augmented Azure OpenAI extraction pipeline paired with a deterministic verification tool, and made evaluation a first-class part of delivery. A DeepEval suite combined deterministic scoring with LLM-judged faithfulness and hallucination checks, gated in CI so the pipeline fails on any confidence regression before it can ship. A companion verification pass cross-checks every extracted value against source data.
Extraction accuracy rose from ~78% to ~95%+ across more than 20,000 records. The verification layer caught roughly 40 incorrect records that the previous tooling had missed, and automation removed about 30 hours of manual work per month. It anchored a $500K AI initiative and became the template for evaluation-gated AI delivery.