Skip to main content
Back to Academy catalog
BuilderAdvanced

MLOps for LLMs

A 65-minute advanced course for ML and platform engineers — evaluation in production, drift, observability, A/B testing, and incident response.

65 min·9 chapters·Technical practitioner·Free

Last updated: 2026-05-19

What you'll learn

By the end of this course you'll be able to:

  • How LLMOps differs from classical MLOps — and the practices that still carry over
  • Production evaluation patterns: online evals, sampling, and human-in-loop loops
  • Drift detection for LLM systems — input, output, and behavioral drift
  • Cost optimization patterns that survive past the first quarter
  • Versioning prompts, models, and evals together — and rolling them safely
  • Observability with OpenTelemetry, MLflow, and LangSmith — and incident response when something breaks at 3am

Who this is for

ML engineers, platform engineers, and SREs operating LLM systems in production. Especially valuable for teams that have shipped at least one LLM feature and are now staring at the second one — knowing that ad-hoc evals, missing traces, and untracked prompt changes will not survive the next outage or the next model upgrade.

Curriculum

9 chapters · 2 hands-on exercises · capstone challenge

Each chapter ends with the learning objectives ticked off. Quizzes are auto-graded with feedback; exercises are open-ended and produce artifacts you can take to your team.

1

1. LLMOps vs MLOps

7 min
  • Identify what changes when the model is generative, non-deterministic, and third-party
  • Map classical MLOps practices that still apply (and the ones that don’t)
2

2. Model evaluation in production

8 minQUIZ
  • Run online evals on sampled production traffic without leaking PII
  • Combine LLM-as-judge with human review on the cases that matter most
3

3. Drift detection for LLM systems

8 minEXERCISE
  • Detect input distribution shift, output drift, and behavioral drift separately
  • Set actionable alert thresholds — not the cosmetic dashboards
4

4. Cost optimization patterns

7 minQUIZ
  • Apply prompt compression, caching, and model-routing to control token spend
  • Decide when a smaller model + retrieval beats a larger model end-to-end
5

5. Versioning prompts and models together

7 minEXERCISE
  • Treat prompts, model versions, and eval suites as one versioned artifact
  • Roll changes with canaries, shadowing, and one-command rollback
6

6. Observability with traces

8 minQUIZ
  • Instrument LLM apps with OpenTelemetry GenAI semantic conventions
  • Wire MLflow or LangSmith for trace search, replay, and eval linking
7

7. A/B testing LLM systems

6 min
  • Design A/B tests that account for non-determinism and slow user feedback
  • Avoid the false-positive trap of judging on a single offline metric
8

8. Incident response for LLM failures

7 min
  • Classify LLM incidents (regression, drift, abuse, vendor outage) and triage each
  • Run a clean post-mortem that produces durable eval cases, not just promises

Capstone: Capstone: Your LLMOps runbook

7 min
  • Author an LLMOps runbook covering evals, drift, cost, rollouts, and incidents
  • Define the on-call response pattern your platform team will operate against

Capstone deliverable: Every learner who completes this course produces «Your LLMOps Runbook» — a tangible artifact you take back to your organization.

Curriculum live · full chapter content rolling out through 2026.

The outline, learning objectives, references, and capstone deliverable are published. Full chapter content (video, narration, exercises) ships progressively. Get notified when each chapter goes live.

Get notified when chapters ship

References & sources

Built on cited sources — not vibes.

Every course is researched fresh against vendor documentation, regulatory sources, and peer-reviewed work. Sources used in this course:

MLflow Documentation

MLflow Project · Source link

OpenTelemetry — GenAI Semantic Conventions

OpenTelemetry · Source link

Azure AI Foundry — Evaluations and Monitoring

Microsoft Learn · Source link

OWASP Top 10 for LLM Applications

OWASP Foundation · Source link

NIST AI Risk Management Framework

National Institute of Standards and Technology · Source link

Course details

Track

Builder

Level

Advanced

Audience

Technical practitioner

Function

IT & Engineering

Industry

Cross-Industry

Stack

Microsoft, Open-source, Stack-agnostic

Paired Gennoor Way phase

build, sustain

Format

video, hands-on, interactive