Evaluation Frameworks Guide

Lessons

Modules

🎓

Bootcamp access

Lo que aprenderás

✓Understand why evaluation is the most critical gap in AI Engineering and adopt evaluation-driven development

✓Master metrics taxonomy: reference-based (BLEU, ROUGE, BERTScore), reference-free, semantic, and custom metrics

✓Design professional golden datasets with annotation strategies, synthetic data, and versioning

✓Evaluate chatbots with response quality metrics, safety checks, multi-turn evaluation, and TruLens

✓Evaluate RAG pipelines with RAGAS: faithfulness, answer relevancy, context precision and recall

✓Evaluate AI agents with trajectory evaluation, tool call accuracy, and task completion metrics

✓Implement LLM-as-judge with calibrated prompts, bias mitigation, pairwise comparison, and multi-judge consensus

✓Build production evaluation pipelines with CI/CD gates, continuous monitoring, regression detection, and alerting

¿Para quién es?

•AI Engineers who build chat, RAG, or agent systems and need to measure their quality rigorously
•Developers deploying AI systems to production without knowing if they actually work well
•Engineers who need to implement evaluation in their company's CI/CD pipeline
•Teams transitioning from "vibes-based evaluation" to data-driven, automated quality metrics
•Professionals who want to understand LLM-as-judge, RAGAS, and TruLens for production use

Requisitos

•Completed Advanced RAG Techniques Guide (#8) or experience building RAG pipelines
•Completed LangChain & LangGraph Guide (#9) or equivalent framework experience
•Completed Building AI Agents Guide (#11) or experience building agents with tool use
•Intermediate Python (functions, classes, async, Pydantic basics)
•Experience with at least one AI system in development or production
•At least one LLM API key (OpenAI recommended)
•Python 3.11+ installed

Course content

1Módulo 1: ¿Por Qué Evaluar Sistemas de AI? — Guía para el Creador8 lessons

2Módulo 2: Taxonomía de Métricas — Guía para el Creador8 lessons

3Módulo 3: Golden Datasets y Test Suites — Guía para el Creador8 lessons

4Módulo 4: Evaluación de Chat y Conversaciones — Guía para el Creador8 lessons

5Módulo 5: Evaluación de RAG con RAGAS — Guía para el Creador8 lessons

6Módulo 6: Evaluación de Agents — Guía para el Creador8 lessons

7Módulo 7: LLM-as-Judge y Evaluación Automatizada — Guía para el Creador8 lessons

8Módulo 8: Pipelines de Evaluación en Producción — Guía para el Creador8 lessons

Reviews

What students say

These reviews are from enrolled students who completed at least 50% of the course. We moderate reviews only on content grounds (spam, offensive language, personal data), never for being critical or negative.

No approved reviews yet.

Be the first to share your experience!