MLOps for LLMs
A 65-minute advanced course for ML and platform engineers — evaluation in production, drift, observability, A/B testing, and incident response.
Last updated: 2026-05-19
What you'll learn
By the end of this course you'll be able to:
- How LLMOps differs from classical MLOps — and the practices that still carry over
- Production evaluation patterns: online evals, sampling, and human-in-loop loops
- Drift detection for LLM systems — input, output, and behavioral drift
- Cost optimization patterns that survive past the first quarter
- Versioning prompts, models, and evals together — and rolling them safely
- Observability with OpenTelemetry, MLflow, and LangSmith — and incident response when something breaks at 3am
Who this is for
ML engineers, platform engineers, and SREs operating LLM systems in production. Especially valuable for teams that have shipped at least one LLM feature and are now staring at the second one — knowing that ad-hoc evals, missing traces, and untracked prompt changes will not survive the next outage or the next model upgrade.
Curriculum
9 chapters · 2 hands-on exercises · capstone challenge
Each chapter ends with the learning objectives ticked off. Quizzes are auto-graded with feedback; exercises are open-ended and produce artifacts you can take to your team.
1. LLMOps vs MLOps
- Identify what changes when the model is generative, non-deterministic, and third-party
- Map classical MLOps practices that still apply (and the ones that don’t)
2. Model evaluation in production
- Run online evals on sampled production traffic without leaking PII
- Combine LLM-as-judge with human review on the cases that matter most
3. Drift detection for LLM systems
- Detect input distribution shift, output drift, and behavioral drift separately
- Set actionable alert thresholds — not the cosmetic dashboards
4. Cost optimization patterns
- Apply prompt compression, caching, and model-routing to control token spend
- Decide when a smaller model + retrieval beats a larger model end-to-end
5. Versioning prompts and models together
- Treat prompts, model versions, and eval suites as one versioned artifact
- Roll changes with canaries, shadowing, and one-command rollback
6. Observability with traces
- Instrument LLM apps with OpenTelemetry GenAI semantic conventions
- Wire MLflow or LangSmith for trace search, replay, and eval linking
7. A/B testing LLM systems
- Design A/B tests that account for non-determinism and slow user feedback
- Avoid the false-positive trap of judging on a single offline metric
8. Incident response for LLM failures
- Classify LLM incidents (regression, drift, abuse, vendor outage) and triage each
- Run a clean post-mortem that produces durable eval cases, not just promises
Capstone: Capstone: Your LLMOps runbook
- Author an LLMOps runbook covering evals, drift, cost, rollouts, and incidents
- Define the on-call response pattern your platform team will operate against
Capstone deliverable: Every learner who completes this course produces «Your LLMOps Runbook» — a tangible artifact you take back to your organization.
Curriculum live · full chapter content rolling out through 2026.
The outline, learning objectives, references, and capstone deliverable are published. Full chapter content (video, narration, exercises) ships progressively. Get notified when each chapter goes live.
References & sources
Built on cited sources — not vibes.
Every course is researched fresh against vendor documentation, regulatory sources, and peer-reviewed work. Sources used in this course:
MLflow Documentation
MLflow Project · Source link
OpenTelemetry — GenAI Semantic Conventions
OpenTelemetry · Source link
Azure AI Foundry — Evaluations and Monitoring
Microsoft Learn · Source link
OWASP Top 10 for LLM Applications
OWASP Foundation · Source link
NIST AI Risk Management Framework
National Institute of Standards and Technology · Source link
Course details
Track
Builder
Level
Advanced
Audience
Technical practitioner
Function
IT & Engineering
Industry
Cross-Industry
Stack
Microsoft, Open-source, Stack-agnostic
Paired Gennoor Way phase
build, sustain
Format
video, hands-on, interactive
You finished the course. Now what?
From course to outcome.
Reading this course is step one. The next step is applying it where you work. Here's how Gennoor helps — without the deck, without the pitch.
Run this for your team
A 2-day workshop or virtual cohort for up to 25 of your people, with exercises run on your data and a 30-day adoption plan.
From $5k · 2 weeks · function-specific
Apply this to your data
A 4–6 week pilot that takes what you learned and ships a working system inside your environment. Fixed scope, fixed price, code transferred day one.
From $25k · 6 weeks · production-grade
Just want to talk?
Free 30-minute call. No deck, no pitch. We listen to your situation and tell you honestly what makes sense — even if it isn't us.
Free · no commitment · 30 minutes
Or just keep learning. We recommend next:
Just finished «MLOps for LLMs». Want this to go further at your organization?