Six Ways AI Gets Your Numbers Wrong (And How to Catch Them)

Even when you use the "Test First, Implement Second" technique from Post 2, AI can still make subtle errors. These errors follow predictable patterns. I’ve spotted six error patterns that show up again and again. Once you recognize them, you’ll catch problems before they make it into your reports or briefing notes.

This post covers the six most common error patterns I've seen in government analytical work—how they show up and what they cost you.

Here are the six patterns:

The Invisible Filter Problem – AI filters data then forgets that filter exists in later calculations
Column Name Hallucination – AI assumes common column names instead of using actual headers
Denominator Drift – Percentage calculations use the wrong baseline after data transformations
Aggregation Level Mismatch – Instructions jump between data levels (regional to provincial) without proper grouping steps
Formula Propagation Errors – Formulas don't adjust correctly when copied down through dataset
The "Works in Theory, Fails in Practice" Gap – Instructions work for clean data but break with real-world messiness

Now let's take a look at each:

Pitfall #1: The Invisible Filter Problem

The pitfall is when AI tells you to filter your data in Step 1, then forgets about that filter in later steps.

Here's what it looks like in practice: Step 1 says "Filter to show only 'Complete' responses." Step 2 creates summary statistics. Then Step 5 says "Calculate percentage using total row count"—but it uses the pre-filter count.

AI generates each instruction somewhat independently. When writing Step 5, it doesn't fully account for the state change from Step 1. It remembers the filter exists but doesn't consistently apply that constraint to all calculations.

After receiving instructions, manually trace through the logic: "If I complete Step 1, what's my new row count? Does Step 5 use that number or the original total?"

As a practical example, if you're analyzing public engagement responses you might filter out duplicate submissions in Step 1, but later calculate participation rates using the original count that includes duplicates. You've just overstated response rates in your report to the minister.

Any time you see "total" or "all" in a calculation step, ask yourself: "Total of what? The filtered data or the original dataset?"

Pitfall #2: Column Name Hallucination

The pitfall is when AI references columns that don't exist in your file, using common or expected names instead of your actual column headers.

Here's a real example: Your file has "Respondent_Age_Category" but AI instructions say "Select the 'Age Group' column." You manually correct it for Step 3, but Step 7 still references "Age Group."

AI defaults to common patterns from its training data. "Age Group" appears more frequently than your specific naming convention, so that's what it assumes you have.

The verification prompt from Post 2 helps because it forces the AI to tell you what columns exist and their exact names. But also, when implementing steps, if you have to manually correct a column name once, search the entire instruction set for other instances of that assumed name.

As a practical example, if you're merging datasets from different ministries, revenue data might use "FiscalYear" while expenditure data uses "Fiscal_Yr". AI assumes both use "Fiscal Year". Your join fails or matches wrong rows, invalidating your budget analysis.

Before following any instruction that references a column, verify that the exact column name exists in your file. Don't assume AI got it right just because it's close.

Pitfall #3: Denominator Drift

The pitfall is when percentage calculations use the wrong baseline because AI doesn't track what the "total" should be after each transformation.

I saw this recently with a colleague: You start with 3,000 survey responses. The AI in Step 2 filters to BC residents only, leaving 2,400 responses. Step 4 further filters to age 18+, leaving 2,100 responses. Then Step 6 says "Calculate regional percentages” (and it assumes 3,000). The denominator should be 2,100 (the current filtered state), not 3,000 (the original dataset).

Each filter narrows your dataset, but AI may reference the original total when calculating proportions instead of tracking the current filtered total.

For any percentage calculation, ask: "What should the denominator be at this point in the workflow given all previous filters?"

If you're analyzing program uptake rates you might filter to eligible participants in Step 1, then to those who completed applications in Step 2. But you calculate uptake as completions ÷ total population instead of completions ÷ eligible population. You run the risk of understating a program’s success and potentially contributing to budget cuts for what would be an effective program.

You’ve got to doublecheck. Write down your row count after each filter. When you hit a percentage calculation, confirm the denominator matches your current row count, not an earlier one.

Pitfall #4: Aggregation Level Mismatch

This pitfall happens when instructions mix different levels of aggregation without proper grouping steps.

Here's what this means: Let’s say that in Step 3 the AI creates a summary of the data by health authority (5 regions). Step 5 says "Add a column showing percentage of provincial total." AI doesn't specify that you need to sum health authority values first before calculating provincial percentages.

The AI might understand the conceptual goal, but still might skip intermediate aggregation steps that aren't explicit in the workflow.

Check whether each calculation specifies what level of data it's operating on. If jumping from regional to provincial, is there a clear aggregation step?

If you're doing budget analysis, summing program expenditures by department, followed by a calculation that requires provincial totals, the AI might not explicitly state that you need to sum department values before calculating province-wide percentages. If you end up using the wrong totals for these calculations, your budget variance analysis will be off.

Track manually: Draw a simple hierarchy (program → department → province). For each calculation, mark which level you're at. If you skip a level, you've probably skipped a necessary aggregation step.

Pitfall #5: Formula Propagation Errors

This pitfall describes a situation where AI might tell you to create a formula in one cell in a spreadsheet, then "copy down," but the formula doesn't adjust correctly for your data structure.

Here's what this looks like: The AI says "In cell D2, enter: =B2/B$500" (intended to divide each value by a fixed total in row 500), then "Copy formula down to D100." If B500 is empty or your data actually ends at row 450, the formula breaks.

AI assumes standard spreadsheet behavior but doesn't verify that cell references, absolute vs. relative references, or named ranges match your actual file structure.

Before copying any formula down, test it in just one additional row. Does it produce the expected result? Check both the formula itself and the calculated value.

Say you're building a financial model calculating cost-per-unit across multiple programs. You might copy formulas down, but denominators could vary by program and AI might have assumed a fixed cell reference. In this case, your per-unit costs will be wrong for most programs, leading to incorrect budget allocation recommendations.

After creating a formula, copy it to one other row and examine the formula (not just the result). Do the cell references adjust the way you expect?

Pitfall #6: The “Works in Theory, Fails in Practice” Gap

The pitfall occurs where the AI's instructions are logically sound for clean, idealized data but fail when encountering real-world messiness.

Here's what this looks like: the AI says "Calculate average response time by dividing total hours by number of requests." Your data includes some "In Progress" requests with no completion time yet. The formula produces errors or wrong averages because it's trying to calculate with blank cells or text values.

AI generates instructions based on idealized datasets and may not anticipate missing values, unexpected formats, or data quality issues in your actual file.

The verification prompt from Post 2 helps because it asks AI to "check for any obvious data quality issues," but you should also mentally test each instruction: "What if this cell is blank? What if this value is text instead of a number?"

Say you're analyzing service delivery metrics. Some cases are ongoing, some are closed, some are transferred between work units. AI might assume all cases have completion dates. Your average processing time calculation either hallucinates or excludes active cases, misrepresenting actual workload and potentially affecting human resourcing decisions.

Before implementing any formula, scan your actual data for blanks, text in number columns, dates in weird formats, or other quality issues that might break the formula.

The Compounding Effect

The consequence of these pitfalls is a compounding effect. If Step 2 uses the wrong denominator (Pitfall #3), and Step 5 builds on Step 2's results, and Step 8 summarizes Step 5, you're now three levels removed from the original error. By the time you spot the problem, you've built an analysis on a faulty foundation. This is why the verification techniques from Post 2 matter—without them, these compounding errors stay hidden; you'll miss the compounding effect entirely.

Validate intermediate results, not just final outputs. After each major transformation—filter, aggregation, calculation—spot-check a few rows to ensure results make intuitive sense.

Quick Verification Questions

Before I implement AI instructions, I run this mental checklist. I pause and ask myself these six questions: 1) If I filtered data earlier, do later calculations account for that? 2) Do these exact column names exist in my file? 3) Am I dividing by the right total given my current data state? 4) Am I operating at the right level (individual, group, region, province)? 5) Will this formula work if I copy it down? What if cells are blank? 6) Will this work with my actual messy data, or just with clean theoretical data?

If you can't confidently answer "yes" to all six questions, stop and verify before proceeding.

A final note: AI models will continue improving, and some of these specific error patterns may become less common over time. But that's not the point. The point is developing a verification mindset—the habit of checking filters, tracing transformations, and validating assumptions before your analysis becomes part of the public record. The specific patterns change. The need for verification doesn't.

What’s Next

Now you know the six most common error patterns. In Post 4, I'll give you a comprehensive verification checklist you can use before, during, and after any AI-assisted analysis—the systematic checks that ensure your numbers are defensible when your director asks questions or when results become part of the public record.

Coming up: Post 4 covers pre-analysis, during-analysis, and post-analysis checks, plus red flags that demand immediate re-verification.