What are Evals?
Evals are automated tests that validate your agent’s behavior, outputs, and execution patterns. They help you ensure your agents perform correctly and consistently across different scenarios.Why Evals Matter
AI agents are non-deterministic - the same input can yield different results. Evals help you:- Validate outputs: Ensure agents produce correct responses
- Check tool usage: Verify agents use the right tools with correct inputs
- Monitor performance: Track execution time and token usage
- Catch regressions: Prevent breaking changes during development
- Test execution patterns: Validate sequential and parallel tool execution
How Evals Work
Timbal’s eval system uses a YAML-based test definition format with a powerful validator system:Core Components
Validators
20+ validators for checking outputs, patterns, types, and more
Flow Validators
Validate execution sequences and parallel tool calls
LLM Validators
AI-powered semantic validation for natural language
CLI
Command-line interface for running and discovering evals
Quick Start
1. Create a test file
Create a file namedeval_greeting.yaml:
2. Run your evals
3. View results
The CLI displays pytest-style output with pass/fail status, duration, and detailed failure information:Eval Structure
Each eval consists of:| Field | Description | Required |
|---|---|---|
name | Unique identifier for the eval | Yes |
runnable | Path to the runnable (file.py::name) | Yes |
params | Input parameters for the runnable | No |
description | Human-readable description | No |
tags | List of tags for filtering | No |
timeout | Maximum execution time in milliseconds | No |
output | Validators for the final output | No |
elapsed | Validators for total execution time | No |
seq! | Sequence flow validator | No |
Eval names must be unique across all eval files. The CLI will error if duplicate names are found.