Diagram of a retrieval-augmented extraction pipeline with a CI eval gate and a rising accuracy gauge
Prior work · led by our founder
Major North American power utility2025PythonAzure OpenAIRAG

RAG Extraction Pipeline for a Power Utility

THE CHALLENGE

A major power utility needed equipment-qualification data extracted from thousands of dense engineering documents at high accuracy. The existing manual, spreadsheet-driven process was slow and silently let errors through, unacceptable in a safety-critical engineering context.

OUR APPROACH

We built a retrieval-augmented Azure OpenAI extraction pipeline paired with a deterministic verification tool, and made evaluation a first-class part of delivery. A DeepEval suite combined deterministic scoring with LLM-judged faithfulness and hallucination checks, gated in CI so the pipeline fails on any confidence regression before it can ship. A companion verification pass cross-checks every extracted value against source data.

THE RESULTS

Extraction accuracy rose from ~78% to ~95%+ across more than 20,000 records. The verification layer caught roughly 40 incorrect records that the previous tooling had missed, and automation removed about 30 hours of manual work per month. It anchored a $500K AI initiative and became the template for evaluation-gated AI delivery.

~78%→95%+
Extraction accuracy
20,000+
Records processed
$500K
AI initiative led
BUILT WITH
PythonAzure OpenAIAzure Document IntelligenceDeepEvalMSSQLAzure DevOps CI