The Report Looked Great. The Numbers Were Wrong.

Jun 11

A vendor's slick AI-generated report showed a dramatic improvement over last year. The improvement was not real. Preventing issues like these comes down to the critical step many AI rollouts skip.

A vendor who runs paid advertising for one of my clients sent over a performance report recently. In the past it came as a simple spreadsheet with a few bullet points underneath. This time it arrived as a polished six-page report, with section headers and a confident narrative running through it. The kind of report that looks like an upgrade.

My first reaction was the one they were hoping for. This looks professional. What good service we are getting.

And then I looked at the numbers…

The improvement was not real

The report showed a significant improvement over the same period last year. Performance up, results trending in the right direction.

But the improvement was not real. Last year's figures were wrong, showing a much worse performance in the previous year. So when this year's accurate numbers were set beside last year's inaccurate ones, the result was a dramatic improvement that existed only on the page. Once I lined up the correct figures from both years, performance had not changed at all. It was flat.

That matters more than it might seem because this vendor had promised improvement. Improvement was the point of the engagement. And the report made flat performance look like the improvement they had promised and not delivered.

A dangerous kind of error

If the error had run the other way, if it had made performance look bad when it was fine, the vendor may have caught it. But a flattering number invites no scrutiny at all. It confirms what everyone hoped was true, so nobody looks closer.

A clean, well-formatted report reads as competence and implies accuracy. Most people will not travel any further down that chain than they have to. The polish does the persuading, and the data rides along underneath it, unexamined. When the output looks this good, the natural instinct is to trust it, which is exactly where the risk lives.

How I caught it

I caught it for one reason only. I keep my own records.

Before I even had a chance to open the report, my client saw the email summary and remarked on our improvement. But I had just run the month’s numbers that morning and they told a different story.

Before saying anything, I compared the dashboard I built against a spreadsheet containing last year's data. My numbers held. Then, I pulled a report directly from the source platform as a third check. My numbers held again.

None of that was extraordinary. I keep good records because the numbers I hand a client have to be right. Without an independent record to check against, there was nothing to catch the mistake on.

What I would have done differently

If this were my business, I would not have sent that report to a single client until the new reporting process had been run alongside the old one for a stretch of time. When you change how a number gets produced, you do not know if the new method agrees with reality yet. The only way to find out is to run both in parallel and compare them, repeatedly, until you have earned the confidence to retire the old way.

This report went out on the assumption that a new process producing a polished result was a new process producing a correct one. Those are not the same thing, and the gap between them is exactly where this went wrong.

How I phase in any AI workflow

That principle is not specific to advertising reports. It is how I approach any AI workflow I put into a client's business.

The specifics change depending on what we are building. A workflow that drafts client communications gets verified differently than one that pulls and presents data. But the shape is always the same. There is a verification phase, and it does not end when the workflow goes live. For a defined period, the new process runs in parallel with whatever it is replacing, and the outputs get checked against each other until the new method has proven itself.

Even after I have confidence in the new workflow, I build in periodic spot checks rather than assuming it will stay correct forever. If you were pulling a number by hand before, you keep pulling it by hand for a while after. The point of automation is not to stop looking. It is to stop looking once you have genuine evidence that you can.

This is slower than switching everything over at once and announcing the upgrade. It is meant to be. The speed people want from AI is real, but it belongs on the far side of verification, not before it. You move quickly once you have proven the thing works, not on the hope that it does.

Who is accountable when the tool gets it wrong

On more than one occasion, I’ve seen the blame get shifted to AI when these errors are exposed. The response is often along the lines of, “we are now using AI for this, and unfortunately it produced an error.”

I have noticed this seems to be an acceptable thing to say. AI made the mistake, so the mistake is not quite the person's fault. I understand why it is tempting. But it does not hold up, and I think we should stop letting it.

My half-German children grew up hearing the proverb: Ein schlechter Handwerker schiebt die Schuld stets auf sein Werkzeug (a poor craftsman blames his tools), which feels particularly apt here. The tool did not choose to send an unverified report to a client. A person built the process, failed to verify its accuracy, and allowed its output to be delivered to paying clients. The tool made the error, but the people behind the process are accountable for the failure.

That is the real lesson, and it is the thing I do for the businesses I work in. I do not sell AI. Tools are easy to find and getting easier by the week to implement. What I build is the discipline around them. The verification, the parallel running, the spot checks, the clear-eyed judgment about what a new process has actually proven versus what it merely looks capable of.

That discipline is what protects a business when a tool eventually gets something wrong, and it always will, eventually.

A report that looks better than the results is not a small problem. It is a quiet one, which is worse. It does not announce itself. It sits there looking professional while it tells you something untrue, and it counts on the fact that you would rather admire it than check it.

Check it anyway.

AI readinessAI Implementation

Kyrsten Kemper