Evaluating AI Output
A 45-minute discipline for professionals reviewing AI-generated work — spotting hallucinations, checking sources, building a verification habit.
7
Chapters
~45 min
Duration
Intermediate
Level
No
Certification
Who this is for
For ICs and managers reviewing AI-generated work — their own, their team's, or a vendor's.
How this course works
- 7 audio-narrated slide chapters · ~45 min of focused content
- Trust trip-wires on every play — what not to cross
- Free verifiable certificate on completion
What you'll walk out with
Specific outcomes from this course — no fluff.
- How to spot hallucinations fast — the textual signals AI invents vs verifies
- How to check sources in 30 seconds — the verification routine that scales
- The 3-pass review pattern — facts, voice, audience
- How to build a quality bar that your team can apply uniformly — not just the senior reviewer
- When to ask the AI to critique its own output — and when that's circular
- The verification habit pattern — paired with one tool, daily, until it's muscle memory
- A 2-week practice plan that turns "AI output" into "trusted draft"
Course content
7 chapters · ~45 min
Why AI evaluation is harder than it looks
The fluency illusion is real. Microsoft FAccT 2024 study (404 participants), Anthropic sycophancy paper, Stanford HAI 2026 — confident writing convinces us regardless of accuracy.
Accuracy vs. usefulness
Two different tests. Accurate-and-useless. Useful-and-wrong. The rule: whichever failure is hardest to recover from goes first. The Deloitte Australia anchor.
Spotting hallucinations in 3 patterns
Confident fabrication. Plausible detail. Stale fact. NIST taxonomy, OpenAI "why models hallucinate", Vectara HHEM — reasoning models actually hallucinate MORE on long-form text.
Evaluating sources and citations
1,353+ court cases through 2026. Mata v. Avianca to the Sixth Circuit $30K sanctions. Deloitte AU$290K refund. The Nature 72% fake-citation finding. The 3-step citation check.
Spotting bias in outputs
Bloomberg resume study (11% top-rank for Black women), EEOC iTutorGroup $365K settlement, Workday Mobley class action, MMLU-ProX 30-point Swahili gap. Demographic + regional patterns.
Building your verification habit
Lally's 66-day median to automaticity. Moore's overprecision research. BJ Fogg's B=MAP formula. The 5-minute three-step routine that survives week three.
Making it stick: your verification playbook
Three never-skip checks. One escalation rule. The Friday review. Fill in the playbook builder, download the markdown, print it, pin it where you can see it.
Want this delivered inside your organisation?
The course is the starting point. The same content powers a 4-week pilot, an org-wide rollout, or a continuous build engagement — set up on your data, with your team, by Gennoor Tech.