Verification as Craft
A working note on how to actually check an AI's output, since "just verify it" is easier said than done.
The hallucination sheet ended with one instruction: verify the claims your work rests on. That's correct, and it's also where most advice stops, as if verifying were obvious. It isn't. Checking an AI's output well is a skill with its own moves, and doing it badly can leave you more confident and just as wrong.
The common move
Both feel like verification. Neither reliably is. The question worth asking: what would actually catch a confident error?
What feels like checking, and what actually checks
- Asking the model if it's sure. The context now rewards consistency, so it usually doubles down.
- Skimming for agreement. You stop at the first source that confirms what you already have.
- Checking everything equally. Effort spread evenly means the load-bearing claim gets the same glance as the trivial one.
- Going to an independent source. Something outside the model's output and outside your own assumption.
- Looking for disagreement. Trying to disconfirm the claim is stronger than collecting agreement.
- Triaging by stakes. Spending your verification budget on the few claims the work actually rests on.
Verification isn't a single act of "looking it up." It's a set of judgments: what to check, what counts as independent, and what would change your mind.
The craft, in three moves
Good verification is mostly about spending limited attention where it matters and seeking the right kind of evidence.
1. Triage by load-bearing-ness. You cannot check everything, and pretending you can just spreads your attention too thin to catch anything. Ask of each specific: if this were wrong, would my work fall over? Check those. Let the trivial ones go.
2. Seek independence, not agreement. A second source that traces back to the same origin isn't independent. Asking the same model again isn't independent. The test is whether the check could have come back different.
3. Try to disconfirm. Look for the source that would prove the claim wrong, not the one that nods along. If you can't find disconfirmation after genuinely looking, that's real evidence. Agreement you went looking for is not.
A second model helps only when you set it up to disagree. Pasted in as "is this right?", it tends to confirm. Framed as an independent skeptic, it can surface the doubt the first model smoothed over. But note its agreement is not proof either: two models can share the same wrong assumption. It's a flag-raiser, not a judge.
Try this
Before trusting an output, mark the one or two claims it truly rests on.
For each, find a source that could have come back different, and actively look for the version that says you're wrong.
Use a second model as a skeptic, not a witness. Its disagreement is a reason to dig; its agreement is not a reason to relax.
The principle underneath
Verification isn't looking until you find agreement; it's looking where you'd find the error if there were one. Spend your attention on what's load-bearing, and seek the source that could prove you wrong.