← Cursos
🎓
IntermediocourseAcceso por bootcamp
Evaluation Frameworks Guide
64
Lecciones
8
Módulos
🎓
Acceso por bootcamp
Lo que aprenderás
✓Understand why evaluation is the most critical gap in AI Engineering and adopt evaluation-driven development
✓Master metrics taxonomy: reference-based (BLEU, ROUGE, BERTScore), reference-free, semantic, and custom metrics
✓Design professional golden datasets with annotation strategies, synthetic data, and versioning
✓Evaluate chatbots with response quality metrics, safety checks, multi-turn evaluation, and TruLens
✓Evaluate RAG pipelines with RAGAS: faithfulness, answer relevancy, context precision and recall
✓Evaluate AI agents with trajectory evaluation, tool call accuracy, and task completion metrics
✓Implement LLM-as-judge with calibrated prompts, bias mitigation, pairwise comparison, and multi-judge consensus
✓Build production evaluation pipelines with CI/CD gates, continuous monitoring, regression detection, and alerting
¿Para quién es?
- •AI Engineers who build chat, RAG, or agent systems and need to measure their quality rigorously
- •Developers deploying AI systems to production without knowing if they actually work well
- •Engineers who need to implement evaluation in their company's CI/CD pipeline
- •Teams transitioning from "vibes-based evaluation" to data-driven, automated quality metrics
- •Professionals who want to understand LLM-as-judge, RAGAS, and TruLens for production use
Requisitos
- •Completed Advanced RAG Techniques Guide (#8) or experience building RAG pipelines
- •Completed LangChain & LangGraph Guide (#9) or equivalent framework experience
- •Completed Building AI Agents Guide (#11) or experience building agents with tool use
- •Intermediate Python (functions, classes, async, Pydantic basics)
- •Experience with at least one AI system in development or production
- •At least one LLM API key (OpenAI recommended)
- •Python 3.11+ installed
Contenido del curso
1Módulo 1: ¿Por Qué Evaluar Sistemas de AI? — Guía para el Creador8 lecciones
2Módulo 2: Taxonomía de Métricas — Guía para el Creador8 lecciones
3Módulo 3: Golden Datasets y Test Suites — Guía para el Creador8 lecciones
4Módulo 4: Evaluación de Chat y Conversaciones — Guía para el Creador8 lecciones
5Módulo 5: Evaluación de RAG con RAGAS — Guía para el Creador8 lecciones
6Módulo 6: Evaluación de Agents — Guía para el Creador8 lecciones
7Módulo 7: LLM-as-Judge y Evaluación Automatizada — Guía para el Creador8 lecciones
8Módulo 8: Pipelines de Evaluación en Producción — Guía para el Creador8 lecciones
Reviews
What students say
Sign in to leave a review.
No approved reviews yet.
Be the first to share your experience!