Skip to main content

Data Quality Governance for Industrial AI

When an AI agent can autonomously generate validation rules for safety-critical infrastructure data, the design challenge shifts from interface to governance: who defines "correct," who intervenes when the system is wrong, and how do you prevent operators from silently deferring to automation they haven't evaluated?

Cognite's Data Quality Service uses SHACL - a W3C graph constraint standard - to validate industrial equipment data across seven quality dimensions. An AI agent generates rules by analysing data patterns and contracts. The backend existed. What didn't exist was a governance model for the human–AI interaction, or any interface at all.

What I Did

I led a co-innovation hackathon with ~15 engineers, data scientists, product managers, and domain consultants from Cognite and a co-innovation partner. I facilitated the discovery session, then built the full working prototype myself in two days.

Sam facilitating discovery workshop with cross-functional team at Cognite
Co-innovation hackathon with engineers, data scientists, and domain consultants from Cognite and a co-innovation partner

Discovery surfaced three structural problems:

  • The person who writes rules is the same person who evaluates violations - meaning rule definition and violation triage can't live in separate workflows without forcing users to hold state across context switches.
  • Not all violations are the same. Some mean the rule is wrong. Others mean the data is wrong. If the interface doesn't make that distinction structural, users default to ignoring what they can't immediately fix.
  • Dimensions with no rules look identical to dimensions with no problems - a signal detection failure that creates false confidence in quality coverage.

The Governance Model

I defined a four-step loop that structures the entire system:

Define - Establish what "good data" means. Rules carry visible provenance (industry standard, AI-drafted, or human-authored) so users calibrate trust based on the source, not just the content.

Assess - Run validation deliberately, not silently. Users commit to a rule set and observe consequences before the system produces a baseline.

Monitor - Track violations over time. Unmonitored dimensions are shown explicitly as gaps, not omitted. Trend indicators surface slow drift before it compounds.

Act - Resolve through two structurally distinct paths: adjust the rule (governance decision) or escalate to the source system (operational fix). Separating these prevents task conflation and reduces decision errors.

Data Quality prototype showing linked split panel with rules on the left and violations on the right
Linked split panel - rules on the left, violations on the right

Design Decisions

Linked split panel, not separate views. The core workspace co-presents rules and violations side by side. Early exploration tested a tab-based layout separating rule management from violation triage. In testing, users constantly switched between tabs to check whether a violation meant the rule was wrong or the data was wrong. The split panel eliminates that working memory cost - the relationship between constraint and failure is always visible.

Sandbox before activation. AI-generated rules are testable against real data before they go live, persistently tagged with their origin, and never auto-applied. This scaffolds trust without inducing complacency - the user remains the calibrator, not a rubber-stamper.

Making absence visible. One of the subtler problems surfaced in discovery: dimensions with no rules looked identical to dimensions with no problems. The interface now explicitly distinguishes "no violations found" from "not yet monitored" - preventing the false confidence that comes from mistaking silence for quality.

Outcome

The governance architecture was validated with cross-functional stakeholders within the engagement. It established the structural foundation for Cognite's Data Quality Service - reframing the product from a validation pipeline with a UI to a human–AI governance system for industrial data trust.