BuilderAdvanced

MLOps for LLMs

A 65-minute advanced course for ML and platform engineers — evaluation in production, drift, observability, A/B testing, and incident response.

9 chapters·Technical practitioner·Free

Last updated: 2026-05-19

Get notified when chapters ship

What you'll learn

By the end of this course you'll be able to:

How LLMOps differs from classical MLOps — and the practices that still carry over
Production evaluation patterns: online evals, sampling, and human-in-loop loops
Drift detection for LLM systems — input, output, and behavioral drift
Cost optimization patterns that survive past the first quarter
Versioning prompts, models, and evals together — and rolling them safely
Observability with OpenTelemetry, MLflow, and LangSmith — and incident response when something breaks at 3am

Who this is for

ML engineers, platform engineers, and SREs operating LLM systems in production. Especially valuable for teams that have shipped at least one LLM feature and are now staring at the second one — knowing that ad-hoc evals, missing traces, and untracked prompt changes will not survive the next outage or the next model upgrade.

Curriculum

9 chapters · 2 hands-on exercises · capstone challenge

Each chapter ends with the learning objectives ticked off. Quizzes are auto-graded with feedback; exercises are open-ended and produce artifacts you can take to your team.

1. LLMOps vs MLOps

Identify what changes when the model is generative, non-deterministic, and third-party
Map classical MLOps practices that still apply (and the ones that don’t)

2. Model evaluation in production

QUIZ

Run online evals on sampled production traffic without leaking PII
Combine LLM-as-judge with human review on the cases that matter most

3. Drift detection for LLM systems

EXERCISE

Detect input distribution shift, output drift, and behavioral drift separately
Set actionable alert thresholds — not the cosmetic dashboards

4. Cost optimization patterns

QUIZ

Apply prompt compression, caching, and model-routing to control token spend
Decide when a smaller model + retrieval beats a larger model end-to-end

5. Versioning prompts and models together

EXERCISE

Treat prompts, model versions, and eval suites as one versioned artifact
Roll changes with canaries, shadowing, and one-command rollback

6. Observability with traces

QUIZ

Instrument LLM apps with OpenTelemetry GenAI semantic conventions
Wire MLflow or LangSmith for trace search, replay, and eval linking

7. A/B testing LLM systems

Design A/B tests that account for non-determinism and slow user feedback
Avoid the false-positive trap of judging on a single offline metric

8. Incident response for LLM failures

Classify LLM incidents (regression, drift, abuse, vendor outage) and triage each
Run a clean post-mortem that produces durable eval cases, not just promises

Capstone: Capstone: Your LLMOps runbook

Author an LLMOps runbook covering evals, drift, cost, rollouts, and incidents
Define the on-call response pattern your platform team will operate against

Capstone deliverable: Every learner who completes this course produces «Your LLMOps Runbook» — a tangible artifact you take back to your organization.

Curriculum live · full chapter content rolling out through 2026.

The outline, learning objectives, references, and capstone deliverable are published. Full chapter content (video, narration, exercises) ships progressively. Get notified when each chapter goes live.

Get notified when chapters ship

References & sources

Built on cited sources — not vibes.

Every course is researched fresh against vendor documentation, regulatory sources, and peer-reviewed work. Sources used in this course:

MLflow Documentation

MLflow Project · Source link

OpenTelemetry — GenAI Semantic Conventions

OpenTelemetry · Source link

Azure AI Foundry — Evaluations and Monitoring

Microsoft Learn · Source link

OWASP Top 10 for LLM Applications

OWASP Foundation · Source link

NIST AI Risk Management Framework

National Institute of Standards and Technology · Source link

Course details

Track

Builder

Level

Advanced

Audience

Technical practitioner

Function

IT & Engineering

Industry

Cross-Industry

Stack

Microsoft, Open-source, Stack-agnostic

Paired Gennoor Way phase

build, sustain

Format

video, hands-on, interactive

You finished the course. Now what?

From course to outcome.

Reading this course is step one. The next step is applying it where you work. Here's how Gennoor helps — without the deck, without the pitch.

Run this for your team

A 2-day workshop or virtual cohort for up to 25 of your people, with exercises run on your data and a 30-day adoption plan.

From $5k · 2 weeks · function-specific

Talk to us about a workshop

Apply this to your data

A 4–6 week pilot that takes what you learned and ships a working system inside your environment. Fixed scope, fixed price, code transferred day one.

From $25k · 6 weeks · production-grade

Scope a pilot

Just want to talk?

Free 30-minute call. No deck, no pitch. We listen to your situation and tell you honestly what makes sense — even if it isn't us.

Free · no commitment · 30 minutes

Book a call

Or just keep learning. We recommend next:

builder

RAG Architectures — Foundations

60 min

builder

Open-Source LLMs for Enterprise

65 min

Just finished «MLOps for LLMs». Want this to go further at your organization?

Back to all 48 Academy courses