Example
This example demonstrates how to validate agent outputs using multiple validators including content checks, format validation, timing, and usage metrics.Eval Configuration
evals.yaml
In this example, we use
output_text_tokens instead of output_tokens because the agent uses OpenAI (openai/gpt-5.2). For Anthropic models, use output_tokens instead. See Validating Token Usage for more details.Agent Implementation
agent.py
Running Evaluations
How It Works
-
Output Validation: Multiple validators check the agent’s response for required content (
contains_all!), excluded content (not_contains!), and format (pattern!). -
Timing Validation: The
elapsedvalidator ensures the agent responds within the specified time limit. - Usage Validation: Span-level validators track resource consumption, such as token usage for LLM calls.
- Combined Validators: All validators must pass for the eval to succeed.
Evaluation Results
Successful Validation
When all validators pass:Failed Validation
When any validator fails:Key Features
- Content Validation: Verify required keywords (
contains_all!) and exclude unwanted content (not_contains!) - Format Validation: Ensure responses follow expected structure with
pattern!regex validation - Time Validation: Monitor execution time with
elapsedvalidators (lt!,lte!, etc.) - Usage Validation: Track resource consumption with span-level
usagevalidators (e.g.,llm.usage.output_text_tokensfor OpenAI orllm.usage.output_tokensfor Anthropic) - Combined Validators: Use multiple validators together - all must pass for the eval to succeed