Data readiness is the unglamorous prerequisite that determines whether an AI project is viable before any technology decision is made. It is not the most interesting part of the conversation. It does not make for compelling demos. And it is the single most common reason well-intentioned AI projects fail in practice.
The gap between a company's assumed data readiness and its actual data readiness is almost always larger than expected. The data exists — but it is distributed across systems that do not communicate. The records are there — but they are inconsistently formatted, partially filled, or reliant on interpretations that vary by team. The outcomes are captured — but not in a way that an AI system can use as a training signal.
An AI system learns what your data teaches it. If your data teaches it noise, it will learn noise precisely.
Six Conditions We Check Before Recommending Any Build
These are the six data conditions we assess in every AI Opportunity Audit. A project is viable when most of these hold. When several do not, the first step is establishing them — not building around them.
- Volume. There must be enough historical examples of the behaviour the system is meant to learn. The specific threshold varies by problem type, but as a general guide: for classification tasks, several hundred labelled examples per class; for prediction tasks, enough historical outcomes to identify reliable patterns. Below that threshold, a model will overfit to noise and generalise poorly.
- Outcome labels. For supervised learning — which covers most practical business AI applications — the training data must include known outcomes. If the system is meant to predict which leads convert, the historical data must include records of which leads actually converted. If those outcomes were never recorded, or were recorded inconsistently, the system has no signal to learn from.
- Consistency. The same process must have been executed consistently enough that historical data reflects a stable underlying pattern. If the process changed significantly — a new CRM, a revised sales methodology, a restructured team — data from before the change may not be a reliable predictor of behaviour under the new conditions. Recency and stability of the data both matter.
- Accessibility. The data must be retrievable in a format the system can ingest. Structured data in a queryable database is ready. Relevant information locked inside PDF documents, email threads, or systems with no export capability is not — without preprocessing work to extract and normalise it. That preprocessing cost should be included in any build estimate.
- Completeness. Critical fields must not be systematically empty. A lead scoring model that relies on company revenue data will not function if revenue is captured for 30% of records. A customer churn predictor that depends on usage data will not function if usage logging was only implemented recently. Identifying which fields are essential and checking their fill rates is a basic data readiness step that is often skipped.
- Attribution. The system must be able to associate inputs with outcomes at the correct level of granularity. If leads are tracked by company but outcomes are recorded by individual contact, the attribution chain is broken. If campaign spend is logged at the channel level but conversions are tracked at the campaign level, the system cannot learn which channel components drive results. Attribution alignment is a data architecture question that has to be resolved before modelling.
What to Do When Data Is Not Ready
The answer is not to wait indefinitely. It is to treat data readiness as a project with a defined end state and a realistic timeline. In most cases, the gap between current state and ready state can be closed in weeks to months — not years — if the work is scoped correctly.
The practical steps typically involve: identifying the minimum data requirements for the target AI system; auditing current data against those requirements; resolving the most critical gaps through process changes, system configuration, or lightweight data collection; and setting a timeline for when the data will meet the threshold to support a build.
That work is less exciting than building a model. It is also the work that determines whether the model will be useful when built. We treat data readiness as a deliverable in its own right — because in most engagements, it is.
Before asking what AI can do for your business, it is worth asking what your data will allow AI to do. The answer to that question is the honest starting point for any build conversation.