Back to Thinking?
Image unavailable
WithPraxis

Data Quality: The Foundation Every AI Project Needs

Heddwyn Coombs

Heddwyn Coombs

Co-founder & Digital Director

April 14, 2026
7 min read

Eighty percent of AI initiatives fail before reaching production. The culprit isn't model complexity – it's bad data. This article examines what proper data quality assessment looks like, how migration transforms messy data into AI-ready systems, and what governance means for mid-market distributors.

Data Quality: The Foundation Every AI Project Needs

Why 80% of AI Projects Never Leave the Pilot Phase

Eighty percent of AI initiatives fail before reaching production. The culprit isn't model complexity or algorithmic sophistication. It's bad data.

Businesses rush headlong into AI implementations without addressing fundamental data hygiene. Product catalogues riddled with inconsistencies. Customer records scattered across five systems with no master reference. Pricing data that takes three days to synchronise between ERP and commerce platform.

When the AI model encounters this chaos, it produces garbage. The pilot gets shelved. The budget gets reassigned. Everyone concludes AI "isn't ready" for their business.

The reality is simpler: you can't build decision intelligence on a foundation of questionable data. This article examines what proper data quality assessment looks like, how migration transforms messy data into AI-ready systems, and what governance means for a mid-market distributor with 200 employees rather than a data science team of 50.

The Hidden Cost of Poor Data Quality in Commerce Operations

Poor data quality compounds across every operational decision. A single inconsistent product record cascades into pricing errors, fulfilment delays, and customer confusion.

Consider a foodservice distributor managing 12,000 SKUs across multiple suppliers. Product codes vary between systems. Descriptions are inconsistent. Unit measures don't align. The pricing team spends hours each week manually reconciling data before they can even begin making pricing decisions.

We documented this exact scenario with a £60M distributor. Pricing decisions took three days because the first two days were spent cleaning data. Margin leakage exceeded £200K annually from pricing errors caused by inconsistent product information (WithPraxis client data, 2024).

Inventory management suffers similarly. Incorrect stock levels trigger unnecessary purchase orders. Phantom inventory leads to stockouts. A building materials client reduced fulfilment costs by 18% simply by cleaning their inventory data and enabling accurate availability checks across their eight depots.

Customer data presents another challenge. Duplicate records create confusion. Incomplete contact information blocks automated communications. Outdated pricing agreements lead to manual intervention on every order. These aren't edge cases – they're the daily reality for most B2B distributors operating with legacy systems and spreadsheet workflows.

Data Quality Assessment: What We Actually Look For

Data quality assessment goes beyond counting blank fields. We examine five dimensions: completeness, consistency, accuracy, timeliness, and validity.

Completeness measures missing critical data. Not every field needs to be populated, but core operational data must exist. Product descriptions, base costs, customer contact details, inventory locations. Missing data blocks automated decisions and forces manual intervention.

Consistency identifies conflicting information across systems. The same product with three different category assignments. Customer addresses that don't match between CRM and billing system. Unit measures that vary between purchasing and sales records. Inconsistency creates decision paralysis.

Accuracy compares data against authoritative sources. Supplier product codes that don't match manufacturer specifications. Customer credit limits that don't reflect current agreements. Pricing that hasn't been updated for seasonal changes. Inaccurate data leads to wrong decisions.

Timeliness measures data freshness. Inventory levels updated weekly when orders happen daily. Customer contact information last verified two years ago. Pricing based on commodity costs from three months past. Stale data makes real-time decisions impossible.

Validity checks data format and business rules. Email addresses without @ symbols. Phone numbers with letters. Product weights listed as negative values. Invalid data breaks automation and requires manual correction.

Our assessment produces a data quality scorecard across these dimensions. Most mid-market distributors score 40–60% on their first assessment. The scorecard identifies which systems need attention and prioritises remediation by business impact.

Data Migration: The Bridge Between Current Reality and AI-Ready Systems

Data migration isn't just moving files between systems. It's transforming messy operational data into a structure that supports decision intelligence.

The process starts with data mapping. We document where each piece of information lives, how it's formatted, and what business rules apply. Product data might exist in the ERP, commerce platform, and supplier feeds – each with different schemas and validation rules.

Transformation rules clean and standardise data during migration. Product categories get normalised to a master taxonomy. Customer addresses are validated against postal databases. Pricing is converted to consistent currency and unit measures. Unit costs are calculated using standard formulas rather than manual spreadsheet calculations.

Validation checkpoints ensure transformed data meets quality standards before loading into target systems. We run completeness checks, business rule validation, and cross-reference verification. Data that fails validation gets flagged for manual review rather than corrupting the target system.

The migration includes rollback procedures. If validation identifies issues after go-live, we can revert to previous data states while fixes are implemented. This reduces risk and allows for iterative improvement rather than big-bang transformations.

For a £40M building materials distributor, migration consolidated inventory data from four systems into a single source of truth. Real-time availability checks became possible. Route optimisation could access accurate depot stock levels. Fulfilment costs dropped 18% within 90 days of migration completion.

Governance That Actually Works for Mid-Market Distributors

Data governance for mid-market businesses must be practical, not academic. Companies with 200 employees can't implement enterprise-level data management programmes. They need lightweight processes that integrate with existing workflows.

Data ownership starts with assigning responsibility for each data domain. Marketing owns customer contact information. Purchasing owns supplier and product data. Sales owns pricing and customer commercial terms. Operations owns inventory and fulfilment data. Clear ownership prevents data quality from becoming everyone's problem and therefore nobody's priority.

Quality monitoring happens through automated checks rather than manual auditing. We implement data quality dashboards that highlight exceptions – duplicate customer records, products without categories, pricing outside normal ranges. Operations teams review exceptions during weekly meetings rather than conducting comprehensive data audits.

Update procedures define how data changes flow through systems. New products get validated against master taxonomy before publication. Customer address changes trigger validation checks. Pricing updates require approval workflow based on margin impact. Structured procedures prevent ad-hoc changes that introduce inconsistencies.

Integration with existing workflows means data quality checks happen within normal operations rather than as separate activities. Order processing includes customer record validation. Product setup includes completeness checks. Inventory updates trigger cross-system synchronisation. Quality management becomes part of operational routine.

The goal is sustainable data quality that doesn't require dedicated data teams. Our Data Quality & Migration service establishes these processes during implementation, then trains internal teams to maintain them.

What Good Data Foundation Enables: From Clean Data to Decision Intelligence

Clean data transforms from operational necessity to competitive advantage when it enables AI-powered capabilities. The same data quality investments that reduce manual errors also power intelligent automation.

Dynamic Pricing Intelligence requires accurate product costs, customer pricing history, and competitive information. Without clean data, AI pricing recommendations become unreliable. With proper data foundation, our clients achieve 6% margin improvement and 90% reduction in pricing errors.

Predictive Replenishment System depends on accurate inventory records, customer ordering patterns, and supplier lead times. Poor data quality means inaccurate predictions and excess inventory. Clean data enables 60–80% reduction in manual ordering and 25% inventory reduction.

AI Search needs consistent product descriptions, accurate categorisation, and complete specification data. Inconsistent product information degrades search relevance. Properly structured data improves search accuracy by 40–60% and reduces phone orders as customers find products online.

Commerce intelligence capabilities – from fraud detection to customer segmentation – all depend on data quality. The investment in getting foundational data right pays dividends across every subsequent AI application.

A £80M industrial distributor illustrates this multiplier effect. After data quality remediation, they deployed AI search, predictive inventory, and automated pricing within six months. Each capability built on the same clean data foundation, accelerating deployment and improving outcomes.

Conclusion: Building on Solid Ground

Data quality isn't a glamorous project, but it's the foundation every successful AI initiative requires. Skip this step, and your AI pilots will produce inconsistent results and fail to reach production. Invest in proper data assessment, migration, and governance, and every subsequent capability becomes faster to deploy and more reliable in operation.

The businesses succeeding with AI aren't those with the most sophisticated algorithms. They're those with the cleanest, most reliable operational data. Getting your data foundation right is the first step toward meaningful AI outcomes.

Learn more about Data Quality & Migration.

Themes

AI Implementation StrategyCommerce Operations Intelligence

Common questions

Heddwyn Coombs

Heddwyn Coombs

Co-founder & Digital Director

Heddwyn is a co-founder of WithPraxis. He has spent 30 years helping mid-market businesses make better operational decisions — first in commerce technology, now in applied AI. He works directly with MDs and ops directors to design and implement AI that earns its keep.

Connect on LinkedIn

Related Articles

Was this article useful?