Overview

What are Evals?

Evals are automated tests that validate your agent’s behavior, outputs, and execution patterns. They help you ensure your agents perform correctly and consistently across different scenarios.

Why Evals Matter

AI agents are non-deterministic - the same input can yield different results. Evals help you:

Validate outputs: Ensure agents produce correct responses
Check tool usage: Verify agents use the right tools with correct inputs
Monitor performance: Track execution time and token usage
Catch regressions: Prevent breaking changes during development
Test execution patterns: Validate sequential and parallel tool execution

How Evals Work

Timbal’s eval system uses a YAML-based test definition format with a powerful validator system:

- name: time_in_madrid
  description: Test that agent returns the time in Madrid
  runnable: agent.py::agent
  tags: ["datetime", "smoke"]
  timeout: 30000
  
  params:
    prompt: "what time is it in madrid"
  
  output:
    type!: "string"
    contains!: ":"
    pattern!: "\\d{1,2}:\\d{2}"
  
  elapsed:
    lt!: 6000
  
  seq!:
    - llm
    - get_datetime
    - llm

Core Components

Validators

20+ validators for checking outputs, patterns, types, and more

Flow Validators

Validate execution sequences and parallel tool calls

LLM Validators

AI-powered semantic validation for natural language

CLI

Command-line interface for running and discovering evals

Quick Start

1. Create a test file

Create a file named eval_greeting.yaml:

- name: greeting_test
  description: Verify the agent greets users appropriately
  runnable: agent.py::my_agent
  
  params:
    prompt: "Hi there!"
  
  output:
    not_null!: true
    type!: "string"
    semantic!: "A polite greeting that acknowledges the user"
  
  elapsed:
    lt!: 5000

2. Run your evals

python -m timbal.evals.cli path/to/eval_greeting.yaml

3. View results

The CLI displays pytest-style output with pass/fail status, duration, and detailed failure information:

========================= timbal evals =========================
collected 1 eval

eval_greeting.yaml
  greeting_test ......................................... PASSED (0.45s)
    tags: greeting, basic
    ├── output
    │   ├── not_null! ✓
    │   ├── type! ✓
    │   └── semantic! ✓
    └── elapsed
        └── lt! ✓

========================= 1 passed in 0.45s =========================

Eval Structure

Each eval consists of:

Field	Description	Required
`name`	Unique identifier for the eval	Yes
`runnable`	Path to the runnable (`file.py::name`)	Yes
`params`	Input parameters for the runnable	No
`description`	Human-readable description	No
`tags`	List of tags for filtering	No
`timeout`	Maximum execution time in milliseconds	No
`output`	Validators for the final output	No
`elapsed`	Validators for total execution time	No
`seq!`	Sequence flow validator	No

Eval names must be unique across all eval files. The CLI will error if duplicate names are found.

See Writing Evals for the complete syntax reference.

Next Steps

Writing Evals

Learn the full eval syntax and best practices

Validators Reference

Complete reference for all validators

Running Evals

CLI options and CI/CD integration

Getting started

Core Concepts

Agents

Workflows

Evals

Knowledge Bases

Deployment

What are Evals?

Why Evals Matter

How Evals Work

Core Components

Validators

Flow Validators

LLM Validators

CLI

Quick Start

1. Create a test file

2. Run your evals

3. View results

Eval Structure

Next Steps

Writing Evals

Validators Reference

Running Evals

Getting started

Core Concepts

Agents

Workflows

Evals

Knowledge Bases

Deployment

​What are Evals?

​Why Evals Matter

​How Evals Work

​Core Components

Validators

Flow Validators

LLM Validators

CLI

​Quick Start

​1. Create a test file

​2. Run your evals

​3. View results

​Eval Structure

​Next Steps

Writing Evals

Validators Reference

Running Evals

What are Evals?

Why Evals Matter

How Evals Work

Core Components

Quick Start

1. Create a test file

2. Run your evals

3. View results

Eval Structure

Next Steps