Why Your AI-Generated Analysis Might Be Wrong (And How to Catch It)

You're a policy analyst with a deadline. You've just closed a public survey and have data from 2,500 residents about housing affordability. Now you need to turn it into a briefing note. So you do what makes sense: you upload your Excel file to an AI tool and ask for help.
"Can you give me step-by-step instructions to calculate response rates by age group and identify the top 3 concerns for each region?"
The AI responds with clear, confident instructions. You follow them. You build your briefing note. You send it up the chain. There's a good chance those numbers are wrong.

Why This Matters in Government Work

A small analytical error in the private sector costs money. In government, it misdirects policy, misallocates budgets, and undermines public trust. Your results inform ministerial briefings, shape budget decisions across health authorities, and determine which communities get priority for affordable housing programs. If your response rate calculations are off because of a subtle error in how the AI told you to handle filtered data, you might underestimate rural concerns. That's not just a spreadsheet mistake—it's a policy impact.

A Real Example: What Goes Wrong

Let me show you what this looks like in practice.

Your prompt: "I need to analyze this housing survey data. Can you give me step-by-step Excel instructions to calculate response rates by age group and identify the top 3 concerns for each region?"

The AI responds:
  1. First, filter your data to show only complete responses
  2. Create a pivot table with Age Group in rows and count of responses
  3. Calculate percentages using the total row count
  4. For regional analysis, create a new pivot table with all responses...
  5. To find top concerns, count mentions in the 'Primary Concern' column for each region
But there are three major issues in these five steps.

Step 3 tells you to calculate percentages using "the total row count"—but you filtered the data in Step 1. Which total? The original 2,847 rows or the filtered 2,500 complete responses?

Step 4 says to create a new pivot table "with all responses"—but didn't you just filter to complete responses only? This instruction directly contradicts Step 1.

Step 5 assumes you have a column called "Primary Concern"—but your actual column is named "Housing_Priority_1". The AI is working from assumptions, not your actual file.

The response rate calculations use the wrong denominator (total sample instead of filtered complete responses). The regional breakdowns accidentally include incomplete responses you meant to exclude. You get error messages when you try to reference the "Primary Concern" column that doesn't exist. You try to manually fix the column name issue, but now your calculations are inconsistent—some use filtered data, some use unfiltered.

Your briefing note now contains statistics that are subtly, but consequentially, wrong. The numbers look plausible. They're in the right ballpark. But they're off just enough to potentially influence decision-making in the wrong direction.

Why AI Makes These Mistakes

AI tools generate steps based on patterns in their training data—not by examining your specific spreadsheet or testing whether the instructions work.

The AI generates Step 5 without fully tracking what happened in Step 1. It "knows" you filtered your data, but it doesn't consistently apply that constraint as it generates subsequent steps. Then the AI assumes standard column names and data structures without verifying what's actually in your file. The AI can't see the results of Step 1 when it's writing Step 5. It generates instructions in sequence without executing them or checking the output.

The Cost of Unverified AI Analysis

In government work, we're accountable for accuracy. When you present analysis that shapes policy decisions, you're putting your professional reputation behind those numbers. You're also influencing decisions that affect real communities and substantial public resources. A miscalculated response rate leads to underestimating concerns from specific populations. Misdirected program funding follows from incorrect demographic breakdowns. An assumption about data structure invalidates your analysis.

What Comes Next

These problems are predictable and preventable. In the next post, I'll show you a specific technique called "Test First, Implement Second" that forces AI to verify its instructions against your actual data before you implement anything.

Instead of asking AI for instructions and hoping they work, you'll ask it to test its proposed approach on your real data first, then provide only the steps it's verified. 

Don't assume AI-generated instructions are correct just because they sound confident and technically plausible. Verification is straightforward, and worth the extra step.

Coming up in this series:

  • Post 2: The "Test First, Implement Second" technique with complete prompt templates
  • Post 3: Six common ways AI gets your numbers wrong (and how to spot them)
  • Post 4: Your government analyst's verification checklist