Skip to main content
Tool usage validation ensures your agent uses the right tools with correct parameters. Verify tool selection, parameter values, execution order, and workflow compliance.

Search Agent

agent.py
from timbal import Agent
from timbal.tools import WebSearch


agent = Agent(
    name="search_agent",
    model="openai/gpt-4.1-mini",
    tools=[WebSearch()],
)

Running Evaluations

python -m timbal.eval --fqn agent.py::agent --tests evals.yaml

Validation Configuration

evals.yaml
- name: eval_search_workflow
  description: Test search agent uses web_search tool with correct query parameter
  turns:
    - input: "Find information about renewable energy trends in the web"
      steps:
        validators:
          contains:
            - name: web_search
          not_contains:
            - name: search_table

How It Works

  1. Tool Execution: The agent executes tools during the turn, and each tool call is captured in the trace.
  2. Step Extraction: The evaluation engine extracts tool calls from the trace, including the tool name and input parameters.
  3. Validation: Validators check if the actual steps match the expected configuration:
    • contains: Verifies required tools are used
    • not_contains: Ensures unwanted tools are not used
    • equals: Validates exact step sequence
    • semantic: Uses LLM to validate tool appropriateness
  4. Input Parameters: Tool input parameters are captured and displayed in actual_steps, allowing you to verify correct parameter values.

Evaluation Results

Successful Validation

When the agent uses the expected tools:
summary.json
{
  "total_files": 1,
  "total_tests": 1,
  "total_validations": 2,
  "inputs_passed": 0,
  "inputs_failed": 0,
  "outputs_passed": 0,
  "outputs_failed": 0,
  "steps_passed": 1,
  "steps_failed": 0,
  "execution_errors": 0,
  "tests_failed": []
}

Failed Validation

When the agent uses incorrect tools or misses required ones:
summary.json
{
  "total_files": 1,
  "total_tests": 1,
  "total_validations": 2,
  "inputs_passed": 0,
  "inputs_failed": 0,
  "outputs_passed": 0,
  "outputs_failed": 0,
  "steps_passed": 0,
  "steps_failed": 1,
  "execution_errors": 0,
  "tests_failed": [
    {
      "test_name": "eval_search_workflow",
      "test_path": "evals.yaml::eval_search_workflow",
      "input": {
        "prompt": [
          "Find information about renewable energy trends in the web"
        ]
      },
      "reason": [
        "steps"
      ],
      "execution_error": null,
      "input_passed": null,
      "input_explanations": [],
      "output_passed": null,
      "output_explanations": [],
      "actual_output": {
        "text": "I found information about renewable energy trends in the database table.",
        "files": []
      },
      "expected_output": null,
      "steps_passed": false,
      "steps_explanations": [
        "No step found with tool 'web_search'.",
        "Step found with tool 'search_table'."
      ],
      "actual_steps": [
        {
          "tool": "search_table",
          "input": {
            "query": "renewable energy trends"
          }
        }
      ],
      "expected_steps": {
        "contains": [
          {
            "name": "web_search"
          }
        ],
        "not_contains": [
          {
            "name": "search_table"
          }
        ]
      }
    }
  ]
}

Key Features

  • Tool Selection: Verify agents use the correct tools (contains, not_contains)
  • Tool Input: View and validate tool input parameters in actual_steps
  • Sequence Validation: Check tools are used in the expected order (equals, contains_ordered)
  • Semantic Validation: Use LLM to validate tool usage appropriateness (semantic)