The AI Question: Infrastructure Before Intelligence

Every board meeting now includes the question. Every investor call. Every strategic planning session. It arrives with the weight of inevitability and the specificity of fog: "What's your AI strategy?"

The honest answer, for most mid-market companies, is that they don't have one — and that this is not the problem they think it is. The problem they think they have is that they're behind. The problem they actually have is that the foundation their AI strategy would need to stand on doesn't exist yet, and building AI on top of it would be like installing a navigation system in a car whose speedometer lies.

Ethan Mollick, whose Co-Intelligence is the most practically grounded book written about AI in a business context, makes a point that should be tattooed on every corporate strategy document: AI is a general-purpose technology, not a product. Its value is determined entirely by what it's applied to, by whom, and with what underlying infrastructure. A general-purpose technology applied to a well-governed information environment produces leverage. The same technology applied to a fragmented, inconsistently defined, partially reconciled data environment produces confident nonsense at unprecedented scale.

The AI question is not "should we adopt AI?" The answer to that is obviously yes, eventually, in some form. The AI question is: "Is the information our AI will consume trustworthy enough to make the AI's output worth acting on?" For most companies, the honest answer is no — and the path to yes runs through infrastructure, not intelligence.

What AI Actually Does (and What It Doesn't)

AI, stripped of its marketing language, does a small number of things extraordinarily well. It recognizes patterns in large datasets. It generates predictions based on historical relationships. It produces natural language summaries and analyses from structured and unstructured inputs. It automates classification tasks that previously required human judgment. And it does all of these things at speeds and volumes that manual processes cannot match.

What AI does not do — and this is the gap that most implementation failures fall into — is evaluate the quality of its own inputs. An AI model asked to forecast cash flow from a dataset where "revenue" means three different things in three different systems will produce a forecast. The forecast will be mathematically coherent. It will be presented with confidence intervals and trend analysis. It will look like a professional financial document. And it will be built on a definitional inconsistency that the model had no way to detect and no reason to flag.

Agrawal, Gans, and Goldfarb — the Toronto economists whose Prediction Machines remains the clearest economic framework for understanding AI — frame this precisely. AI reduces the cost of prediction. It does not reduce the cost of judgment, which requires understanding what the prediction means in context. And it does not reduce the cost of data quality, which determines whether the prediction reflects reality or an artifact of how the data was assembled.

The practical consequence is that AI has a multiplier effect on whatever it's applied to. Applied to clean, governed, consistently defined data, it multiplies the organization's analytical capacity. Applied to messy, ungoverned, inconsistently defined data, it multiplies the organization's capacity to produce authoritative-looking analysis that is wrong in ways that are harder to detect than the manual version was, because the manual version at least had a person in the loop who might have noticed that something didn't add up.

The Data Readiness Gap

The gap between AI ambition and AI readiness is not a matter of opinion. It's extensively documented.

Finance AI adoption surged from 37% to 58% of organizations between 2023 and 2024, then essentially flatlined — a plateau that reflects the collision between enthusiasm and infrastructure reality. The single largest obstacle to AI adoption in finance, across consecutive years of Gartner's surveys, is the same answer: inadequate data quality and availability. Not model sophistication. Not talent. Not budget. The data.

The numbers behind that obstacle are worth sitting with. Only 12% of organizations report that their data is of sufficient quality and accessibility for effective AI implementation. Two-thirds of data leaders doubt their data is AI-ready. And here's the finding that should concern anyone planning an AI investment: 80% of organizations believed their data was ready for AI before they started implementing. Then 52% encountered significant quality challenges once they actually tried.

That confidence gap — between what organizations believe about their data and what they discover when they stress-test it — is the most expensive gap in the current AI landscape. It means that the typical AI project begins with an assumption about data quality that is almost certainly too optimistic, proceeds through a proof of concept that works because the proof-of-concept dataset was manually curated, and then fails at scale because the production data environment is nothing like the curated dataset.

The result: roughly half of generative AI projects are abandoned after proof of concept. Not because the AI didn't work. Because the data it needed to work on wasn't ready.

The Amplification Problem

The consequences of deploying AI on ungoverned data are not merely "less accurate results." They're a specific and dangerous failure mode that Brian Christian, in The Alignment Problem, describes as one of the central challenges of deploying AI in real-world systems: the system optimizes for what it was given, not for what was intended.

An AI model analyzing revenue trends doesn't know that "revenue" means something different in the CRM than it does in the ERP. It processes both inputs as if they describe the same thing. The resulting trend analysis blends two different measurements into a single output that is internally consistent — the math is correct — but externally meaningless, because it's analyzing a composite entity that doesn't correspond to any real business quantity.

This isn't a hypothetical edge case. It's the default condition in any organization where metrics haven't been governed through a shared definition layer. And the problem compounds in a way that manual analysis doesn't. A human analyst encountering two different revenue numbers from two different systems will usually pause, investigate, and either reconcile or flag the discrepancy. The analyst might be slow, but their slowness includes a quality check that's built into the human cognitive process: the instinct that something doesn't look right.

AI doesn't have that instinct. It processes the inputs it receives with perfect equanimity. Consistent data and inconsistent data look the same from the model's perspective — they're both matrices of numbers. The quality check that the human analyst provides for free has to be engineered into the AI system explicitly, through the governance layer that ensures the inputs are trustworthy before the model ever sees them.

Oxford researchers documented a related phenomenon they call "model collapse" — what happens when AI models are trained on data that includes AI-generated outputs from previous iterations. Each generation degrades slightly from the one before, like taking a photograph of a photograph until only noise remains. The business equivalent is an AI system that generates analyses from ungoverned data, which are then used as inputs for subsequent analyses, each iteration compounding the definitional inconsistencies of the one before. The outputs look increasingly sophisticated. The connection to reality loosens with each cycle.

The Ladder, Not the Leap

The honest path to AI value in finance and operations is not a leap. It's a ladder — a sequence of capabilities, each dependent on the one beneath it, each producing value independently while creating the conditions for the next level to work.

Rung 1: Governed data. Before any AI is involved, the organization establishes the foundational integrity that every subsequent rung depends on. Systems of record are designated — one authoritative source per data domain. Definitions are written and shared — "revenue" means one thing, computed one way, from one source. Reconciliation cadences are installed — critical data is verified against authoritative evidence on a rhythm that catches errors before they propagate. This rung produces value immediately, with or without AI: faster closes, fewer reconciliation meetings, more trusted reports. It also produces the data environment that makes every subsequent rung reliable.

Rung 2: Automated monitoring. Rule-based automation — not machine learning, just programmatic logic — applied to the governed data. Threshold alerts that fire when cash drops below a defined level. Reconciliation checks that run nightly and flag discrepancies. Variance detection that identifies when a metric moves outside its expected range. This rung doesn't require sophisticated AI. It requires consistent data and explicit rules about what constitutes an anomaly. The value is early warning: problems surface when they're small, not when they've compounded into the quarterly surprise.

Rung 3: Pattern recognition. This is where machine learning earns its keep — applied to data that has been governed at Rung 1 and monitored at Rung 2. Anomaly detection that learns what "normal" looks like for this business and flags deviations that rule-based systems wouldn't catch. Forecasting models that incorporate more variables and longer histories than a spreadsheet can manage. Classification systems that automate categorization tasks — expense coding, revenue allocation, customer segmentation — that currently consume hours of human judgment.

The critical distinction at this rung: the models are operating on governed data with known definitions, which means their outputs can be evaluated against a standard. When the anomaly detector flags something, the investigation starts from a trusted baseline. When the forecast diverges from actuals, the variance analysis can identify whether the model was wrong or the inputs changed — because the inputs are governed well enough to make that distinction.

Rung 4: Decision support. AI-generated analysis, scenario modeling, and recommendation systems that augment executive judgment. What happens to cash flow if collections slow by five days? What's the margin impact of shifting the product mix toward the higher-volume, lower-margin offering? If we hire three people in Q3, what does the P&L look like under optimistic, base, and stress scenarios? These questions, answered by AI systems operating on governed data, produce genuine decision leverage — faster scenario analysis, more comprehensive sensitivity testing, richer context for the judgment calls that still require human intelligence.

Rung 5: Autonomous execution. The most advanced rung, and the one that requires the most trust in the underlying infrastructure. AI systems that don't just recommend but act: automated invoice processing with approval routing, dynamic cash positioning, algorithmic vendor payment optimization, real-time pricing adjustments based on capacity and demand signals. Each autonomous action has a defined scope, explicit boundaries, and a kill switch — a mechanism for a human to override or halt the automation when conditions fall outside the parameters it was designed for.

Every rung has a kill switch, and the kill switch philosophy matters. AI systems should be designed so that reverting to human control is always possible, always fast, and never requires understanding the model's internals. The question is not "will the AI make mistakes?" — it will. The question is "when the AI makes a mistake, how quickly can a human detect it and intervene?" The answer depends entirely on the governance infrastructure at Rungs 1 and 2.

What "AI-Ready" Actually Means

The term "AI-ready" has become a marketing category — vendors use it to sell platforms, consultants use it to sell assessments, and organizations use it to describe an aspiration without specifying what it means. It means something specific.

An AI-ready organization has four properties.

The first is definitional consistency. Every metric the AI will consume has one definition, documented, with an owner responsible for its integrity. The AI doesn't encounter "revenue" from one system and "revenue" from another system and have to guess whether they measure the same thing. They measure the same thing because the definition layer enforces it.

The second is data lineage. For every input the AI processes, the organization can trace the data's path from its authoritative source through every transformation to the point where the AI receives it. When the AI produces an unexpected output, the investigation can follow the lineage backward to identify whether the issue is in the model or in the data — and if in the data, exactly where.

The third is reconciled freshness. The data the AI operates on is current enough for the decisions it's informing, and its currency is verified through reconciliation rather than assumed. An AI system making weekly cash recommendations from data that hasn't been reconciled since last month is not AI-ready. It's AI-dangerous.

The fourth is exception governance. When the AI produces an output that conflicts with human judgment, there is a defined process for investigation and resolution. Not "the human overrides the AI" — that's one possible outcome. The process determines whether the AI caught something the human missed, or the human caught something the AI couldn't see, and the system learns from the resolution. Without exception governance, the organization can't improve either the AI or the humans' use of it.

These four properties are infrastructure, not intelligence. They don't require any AI to build. They require the same data governance discipline that produces trustworthy reports, reliable closes, and meetings that start with decisions rather than reconciliation. AI readiness is a byproduct of operational integrity.

The Honest Conversation

There are two versions of the AI conversation happening in companies right now.

The first version starts with the technology. What tools should we adopt? Which vendor has the best platform? How do we build a data lake? What's our prompt engineering strategy? This conversation feels productive because it generates action items and vendor evaluations and pilot project plans. It also, reliably, produces the 50% abandonment rate documented across the industry — because the conversation started at Rung 3 or 4 without verifying that Rungs 1 and 2 were in place.

The second version starts with the infrastructure. Can we close the books in under a week? Do our financial metrics have documented definitions that are consistent across systems? When two people compute the same KPI, do they get the same number? Is the data flowing between our critical systems monitored for integrity? These questions are less exciting. They don't involve demonstrations of generative AI producing impressive-looking analyses in seconds. They involve the mundane work of governing data, defining terms, and building the integrity layer that everything else depends on.

The second conversation is the one that leads to AI value. Not because it's more cautious, but because it's more honest about the sequence. You can't skip the infrastructure and arrive at reliable intelligence. You can skip it and arrive at unreliable intelligence that looks reliable — which is, precisely, the worst possible outcome.

Peter Drucker, decades before AI entered the business vocabulary, observed that the most important thing in communication is hearing what isn't said. The AI tools now flooding the market are extraordinarily good at saying things. They are constitutionally unable to evaluate whether the things they're saying are grounded in reality. That evaluation — the judgment about whether the data is trustworthy, whether the definitions are consistent, whether the analysis reflects the actual business or a statistical artifact of messy inputs — remains entirely a human responsibility, supported by human-designed governance.

AI is leverage. Infrastructure determines what it's leverage on.

The Path Forward

For a company being asked "what's your AI strategy?" by its board, its investors, or its own leadership team, the most credible answer is also the most uncomfortable one: we're building the infrastructure that will make AI investments productive rather than performative.

That answer is uncomfortable because it doesn't include a timeline for a ChatGPT-powered financial analyst or an autonomous forecasting engine. It includes a timeline for governed metric definitions, reconciled data sources, and integration architecture that produces consistent information across systems. These deliverables are not impressive in a board presentation. They are the difference between an AI investment that compounds and one that is quietly abandoned eighteen months later.

The companies that will extract the most value from AI in the next five years are not the ones adopting fastest. They're the ones whose data is cleanest, whose definitions are most consistent, and whose governance is most mature. AI will find them — the tools are becoming ubiquitous and commoditized. What AI won't find for them is the infrastructure that makes its outputs trustworthy. That has to be built on purpose.

Infrastructure before intelligence. It's the only sequence that works.

The AI Question explores why data governance is the prerequisite for AI value. Related: The Integration Tax, The End of One True Number, The Visibility Crisis.

The AI Question: Infrastructure Before Intelligence

What AI Actually Does (and What It Doesn't)

The Data Readiness Gap

The Amplification Problem

The Ladder, Not the Leap

What "AI-Ready" Actually Means

The Honest Conversation

The Path Forward

Continue reading

The Dashboard That Lies

The End of One True Number

The Metrics Contract