Numbers You Can Defend - Gov Analyst's Verification Checklist

We've covered why the AI makes errors (Post 1), how to prevent them with better prompts (Post 2), and the six most common error patterns (Post 3). All of that knowledge is useful. None of it helps when you're under deadline pressure and skipping steps anyway.

That's what this checklist is for. It's a verification routine you can follow for any AI-assisted quantitative analysis — the systematic checks that make your numbers defensible before they land on your director's desk.

Pre-Analysis: Before You Start

Before you ask the AI for any instructions, get four things clear.

First, ask the AI to identify your actual column names. Don't let it assume — make it tell you what it sees. Second, confirm the data types for your key columns. Is "ResponseDate" actually formatted as a date, or is it text that looks like a date? Third, identify data quality issues upfront. Blanks, outliers, inconsistencies — find them now, not after you've built your analysis. Fourth, clarify your total population or sample size. What's your denominator? Make it explicit before any calculations happen.

Once you know your data structure, define your metrics explicitly. "Response rate" could mean complete surveys ÷ total invited, or complete surveys ÷ total started — the AI won't know which you need unless you say so. The same applies to filters: be precise about whether percentages use only complete responses or all responses including incomplete ones. And don't assume the AI knows your business rules. Fiscal year boundaries, eligibility criteria, time period definitions — spell them out.

Here's what this looks like in practice. Instead of "calculate program uptake rates," say: "Calculate program uptake as completed applications ÷ eligible residents (age 18+, BC residents, household income below $50K). Exclude applications marked 'Withdrawn' or 'Duplicate.'"

During-Analysis: As You Implement Steps

Don't wait until the end to check your work — problems compound quickly once they're built into your analysis.

As you work through the AI's steps, four things can quietly go wrong. Later steps may not account for filters applied earlier — if Step 1 filtered to complete responses, does Step 5's calculation use the filtered count? Column names may drift — if the AI corrected itself from "Age_Group" to "AgeCategory" in Step 3, did it update all later references? New columns you've created may not be referenced correctly downstream — you created "Cost_Per_Unit" in Step 4, but does Step 7 reference exactly that name? And denominators may no longer reflect your current data state — after three rounds of filtering, which total are you actually dividing by?

Four spot-checks catch problems before they compound. After filtering, confirm your new row count makes sense — if you filtered to BC residents and went from 3,000 to 2,400 rows, ask yourself whether 80% BC residency is plausible for your dataset. After aggregating, check that subtotals add up to your expected totals; regional figures should sum to the provincial total, and if they don't, something's wrong. After calculating percentages, verify they sum to roughly 100% — demographic breakdowns that land at 87% or 134% aren't a rounding issue, they're a signal to stop and investigate. After creating pivot tables, manually verify a small subset by filtering to one region and counting responses for one demographic group yourself. If that number doesn't match your pivot table, don't proceed.

Before copying any formula down through your dataset, test it on two or three rows first. A formula that looks correct can behave differently at scale — especially with absolute references. If you're dividing by a provincial total in cell B500, check whether the reference should be B$500 or $B$500; the wrong choice produces silent errors that only show up rows later. Also confirm how the formula handles blank cells. Depending on how it's written, a blank might return an error, a zero, or a wrong calculation — any of which could corrupt downstream results. If you're using a lookup function (VLOOKUP, for example), verify that your lookup range actually contains the values you're matching. Looking up fiscal years in a table that only contains calendar years won't throw an error; it'll just return wrong results.

Post-Analysis: Before Using Results

Before these numbers go into a briefing note, run four sanity checks.

Do the results pass a common sense test? If 95% of respondents are age 80+, something's wrong with either your data or your analysis. Are totals mathematically consistent? Regional subtotals must sum to the provincial total — if they don't, you have a logic error to find before you go any further. Are trends plausible? A response rate that drops from 40% to 5% year over year is possible, but it demands an explanation before you report it. And do different aggregation paths produce the same result? Calculating provincial totals by summing regions should give you the same number as calculating directly from raw data. If it doesn't, your methodology has a gap somewhere.

Pick at least two of these four cross-validation approaches before you call the analysis done.

The manual sample test: select five to ten rows and calculate the result yourself. Does it match your formula or pivot table? The different path test: calculate the same metric using a different method — if you built a pivot table, try COUNTIF formulas instead. If you get a different answer, one of the methods is wrong. The colleague review: can someone else follow your steps and reproduce your results? If not, your methodology isn't defensible regardless of whether the numbers are correct. The documentation test: can you explain each step and why it's the right approach? Uncertainty here is a signal to keep checking, not to proceed.

Four additional checks matter specifically in public sector contexts.

Fiscal year alignment: if you're working with budget data, confirm all calculations use the same fiscal year definition. BC's fiscal year runs April through March, not January through December — a common source of error when combining data from different sources. Geographic consistency: verify that regional boundaries are applied consistently throughout. Health authorities, school districts, and ministries don't always share the same geographic definitions, and mixing them produces results that can't be compared. Time period equivalence: if you're comparing across time, confirm the periods are actually comparable — the same number of months, the same seasonal pattern. Comparing summer service volumes to winter ones can mislead even when the math is correct. Policy alignment: check that your calculation methods match how your ministry officially defines these metrics. If your ministry's service plan defines "program completion rate" as completions ÷ enrollments, calculating it as completions ÷ eligible population gives you a different number with a different meaning — even if both are technically defensible.

Red Flags That Demand Immediate Re-Verification

Five results should stop you immediately. Don't proceed past any of these until you know what caused them.

Unexpected totals: if regional values don't sum to the provincial total, something is wrong with your aggregation logic. Percentages that don't add up: demographic breakdowns that land at 87% or 134% aren't a rounding issue — they indicate a denominator problem or a filter inconsistency. Implausible outliers: if one health authority shows ten times the average, it could be real, but verify before reporting. A data entry error and a genuine anomaly can look identical at first glance. Changed results without changed data: if you re-run the same steps and get different numbers, your formulas are likely referencing dynamic ranges that have shifted. Inconsistency across similar metrics: a response rate of 65% for one question and 12% for another on the same survey almost certainly means the calculations aren't using the same base — find it before the briefing note does.

When to Escalate or Seek Expert Review

Completing this checklist doesn't mean every analysis is ready to go. Some warrant a second set of eyes regardless of how carefully you've worked through the steps.

Seek subject matter expert consultation if your results will directly inform ministerial decisions or public announcements, if the analysis involves complex statistical methods like regression or significance testing, if you're combining datasets from multiple sources with different definitions, if results contradict previous reporting without a clear explanation, or if you're working with sensitive data including personal information or financial details.

Seek methodological review if you're creating a metric or KPI that hasn't been reported before, if you're comparing data across time periods with known methodological changes, if sample sizes are small enough that results may not be statistically robust, or if you're making assumptions about missing or incomplete responses. A second opinion before publishing is easier to manage than a correction afterward.

Documentation: Your Audit Trail

Keep a record of five things as you work: the exact prompts you used with the AI, any clarifications or corrections you made to its instructions, assumptions you made along the way (for example, how you treated blank responses), intermediate results rather than just final outputs, and the validation checks you ran and what they found.

This isn't bureaucratic overhead. Your analysis may need to be replicated by a colleague, reviewed by a supervisor, or produced in response to an FOI request. If you're away when questions arise, your documentation is what allows someone else to pick up where you left off. And when something goes wrong — which it will eventually — a clear record of your steps is what lets you and your work unit learn from it rather than repeat it.

The Five-Minute Final Check

Before anything leaves your desk, run through these five checks.

  1. The math check: do subtotals sum to totals? Do percentages add to roughly 100%?
  2. The sample check: manually verify three to five rows or calculations against your formulas.
  3. The logic check: do the results make sense given what you know about the topic?
  4. The consistency check: are you using the same filters, definitions, and time periods throughout?
  5. The "can I explain this?" check: if your director asks how you got these numbers, can you walk through each step with confidence?

If any of these fail, stop and re-verify before the results go anywhere.

The first few times through this checklist, it'll feel slow. That's normal. After a few analyses you'll start catching errors instinctively, and you'll develop a sense for where the AI tends to make mistakes with your specific data. Keep the checklist handy anyway — particularly when you're working with unfamiliar data or under pressure to move quickly. Those are exactly the conditions where steps get skipped. They're also the conditions where errors make it into briefing notes.

AI is a useful tool for quantitative analysis, but you remain accountable for the accuracy of your results. A few minutes of systematic verification is easier to manage than reworking a briefing note after your director questions the numbers — or worse, after the numbers have already shaped a decision.

  • Post 1: Why Your AI-Generated Analysis Might Be Wrong (and How to Catch It)

  • Post 2: The Test First, Implement Second Technique

  • Post 3: Six Ways AI Gets Your Numbers Wrong

  • Post 4: Numbers You Can Defend — Gov Analyst's Verification Checklist