Skip to main content
Evals can validate which tools agents use during execution. Ensure your agent follows the expected workflow by using the right tools with correct parameters:

Search Agent

agent.py
from timbal import Agent, Tool

def web_search(query: str) -> str:
    """Search the web for information about a given query."""
    return f"Web search results for '{query}': Found comprehensive information about renewable energy trends, including solar and wind power growth statistics."

def search_table(table_name: str, criteria: str) -> str:
    """Search within a specific table for data matching criteria."""
    return f"Table search results from '{table_name}' for '{criteria}': Found specific data entries."

agent = Agent(
    name="search_agent",
    model="openai/gpt-4.1-mini",
    tools=[
        Tool(handler=web_search, description="Search the web for information about a given query."),
        Tool(handler=search_table, description="Search within a specific table for data matching criteria."),
    ],
)

Running Evaluations

python -m timbal.eval --fqn agent.py::agent --tests evals.yaml

Example Evaluation Output

Successful Validation

summary.json
{
  "total_files": 1,
  "total_tests": 1,
  "total_turns": 1,
  "outputs_passed": 0,
  "outputs_failed": 0,
  "steps_passed": 1,
  "steps_failed": 0,
  "usage_passed": 0,
  "usage_failed": 0,
  "execution_errors": 0,
  "tests_failed": []
}

Failed Validation

summary.json
{
  "total_files": 1,
  "total_tests": 1,
  "total_turns": 1,
  "outputs_passed": 0,
  "outputs_failed": 0,
  "steps_passed": 0,
  "steps_failed": 1,
  "usage_passed": 0,
  "usage_failed": 0,
  "execution_errors": 0,
  "tests_failed": [
    {
      "test_name": "eval_search_workflow",
      "test_path": "evals.yaml::eval_search_workflow",
      "input": {
        "text": "Find information about renewable energy trends in the web"
      },
      "reason": [
        "steps"
      ],
      "execution_error": null,
      "output_passed": null,
      "output_explanations": [],
      "actual_output": {
        "text": "The table \"Energy_Trends_2024\" contains specific data entries related to renewable energy trends. Would you like a summary of key trends, statistical data, or detailed entries from the table? Please specify your preference.",
        "files": []
      },
      "expected_output": null,
      "steps_passed": false,
      "steps_explanations": [
        "No step found with tool 'web_search' and input containing {'query': 'renewable energy trends'}.",
        "Step found with tool 'search_table'."
      ],
      "actual_steps": [
        {
          "tool": "search_table",
          "input": {
            "table_name": "Energy_Trends_2024",
            "criteria": "renewable energy"
          }
        }
      ],
      "expected_steps": {
        "contains": [
          "web_search"
        ],
        "not_contains": [
          "search_table"
        ]
      },
      "usage_passed": true,
      "usage_explanations": []
    }
  ]
}

Validation Configuration

evals.yaml
- name: eval_search_workflow
  description: Test search agent uses correct tools in proper sequence
  turns:
    - input: "Find information about renewable energy trends in the web"
      steps:
        validators:
          contains:
            - name: web_search
              input:
                query: "renewable energy trends"
          not_contains:
            - name: search_table

Key Features

  • Tools Used Validation: Verify agents use the correct tools with proper parameters
  • Tool Sequence Validation: Check that tools are used in the expected order
  • Workflow Enforcement: Ensure agents follow expected business logic
  • Semantic Tool Validation: Use LLM to validate tool usage appropriateness