Basic Usage
Run evals from the command line:CLI Options
| Option | Description |
|---|---|
path | Path to eval file or directory (positional) |
--runnable | Fully qualified name of the runnable (file.py::name) |
--log-level | Logging level (DEBUG, INFO, WARNING, ERROR) |
-s, --no-capture | Disable stdout/stderr capture (show output live) |
-t, --tags | Filter evals by tags (comma-separated) |
-V, --version | Show version information |
Runnable Specification
The runnable can be specified in two ways: 1. Via CLI flag:Tag Filtering
Filter evals by tags using-t or --tags:
Output Capture
By default, stdout/stderr from your agent is captured and only shown on failure. Use-s to see output live:
File Discovery
Naming Patterns
Timbal discovers eval files matching these patterns:eval*.yaml- e.g.,eval_search.yaml,evals.yaml*eval.yaml- e.g.,search_eval.yaml,my_eval.yaml
Directory Structure
Configuration File
Create anevalconf.yaml in your project root for shared configuration:
evalconf.yaml.
Output Format
Successful Run
Failed Run
Exit Codes
| Code | Meaning |
|---|---|
| 0 | All evals passed |
| 1 | One or more evals failed |
| 2 | Configuration or setup error |
CI/CD Integration
GitHub Actions
GitLab CI
Pre-commit Hook
Debugging
Verbose Output
Run Single Eval
Test a specific eval during development:Environment Variables
Set environment variables for testing:Best Practices
Organize evals by speed
Organize evals by speed
Separate fast smoke tests from slow regression tests:
Use meaningful exit codes
Use meaningful exit codes
Check exit codes in scripts:
Set timeouts
Set timeouts
Prevent hanging tests with timeouts (in milliseconds):
Use tags for filtering
Use tags for filtering