← Cursos
🎓
IntermediocourseAcceso por bootcamp

Evaluation Frameworks Guide

64

Lecciones

8

Módulos

🎓

Acceso por bootcamp

Lo que aprenderás

Understand why evaluation is the most critical gap in AI Engineering and adopt evaluation-driven development
Master metrics taxonomy: reference-based (BLEU, ROUGE, BERTScore), reference-free, semantic, and custom metrics
Design professional golden datasets with annotation strategies, synthetic data, and versioning
Evaluate chatbots with response quality metrics, safety checks, multi-turn evaluation, and TruLens
Evaluate RAG pipelines with RAGAS: faithfulness, answer relevancy, context precision and recall
Evaluate AI agents with trajectory evaluation, tool call accuracy, and task completion metrics
Implement LLM-as-judge with calibrated prompts, bias mitigation, pairwise comparison, and multi-judge consensus
Build production evaluation pipelines with CI/CD gates, continuous monitoring, regression detection, and alerting

¿Para quién es?

  • AI Engineers who build chat, RAG, or agent systems and need to measure their quality rigorously
  • Developers deploying AI systems to production without knowing if they actually work well
  • Engineers who need to implement evaluation in their company's CI/CD pipeline
  • Teams transitioning from "vibes-based evaluation" to data-driven, automated quality metrics
  • Professionals who want to understand LLM-as-judge, RAGAS, and TruLens for production use

Requisitos

  • Completed Advanced RAG Techniques Guide (#8) or experience building RAG pipelines
  • Completed LangChain & LangGraph Guide (#9) or equivalent framework experience
  • Completed Building AI Agents Guide (#11) or experience building agents with tool use
  • Intermediate Python (functions, classes, async, Pydantic basics)
  • Experience with at least one AI system in development or production
  • At least one LLM API key (OpenAI recommended)
  • Python 3.11+ installed

Contenido del curso

1Módulo 1: ¿Por Qué Evaluar Sistemas de AI? — Guía para el Creador8 lecciones
2Módulo 2: Taxonomía de Métricas — Guía para el Creador8 lecciones
3Módulo 3: Golden Datasets y Test Suites — Guía para el Creador8 lecciones
4Módulo 4: Evaluación de Chat y Conversaciones — Guía para el Creador8 lecciones
5Módulo 5: Evaluación de RAG con RAGAS — Guía para el Creador8 lecciones
6Módulo 6: Evaluación de Agents — Guía para el Creador8 lecciones
7Módulo 7: LLM-as-Judge y Evaluación Automatizada — Guía para el Creador8 lecciones
8Módulo 8: Pipelines de Evaluación en Producción — Guía para el Creador8 lecciones
Reviews

What students say

Sign in to leave a review.

No approved reviews yet.

Be the first to share your experience!