Search Agent
agent.py
Running Evaluations
Validation Configuration
evals.yaml
How It Works
- Tool Execution: The agent executes tools during the turn, and each tool call is captured in the trace.
- Step Extraction: The evaluation engine extracts tool calls from the trace, including the tool name and input parameters.
-
Validation: Validators check if the actual steps match the expected configuration:
contains: Verifies required tools are usednot_contains: Ensures unwanted tools are not usedequals: Validates exact step sequencesemantic: Uses LLM to validate tool appropriateness
-
Input Parameters: Tool input parameters are captured and displayed in
actual_steps, allowing you to verify correct parameter values.
Evaluation Results
Successful Validation
When the agent uses the expected tools:summary.json
Failed Validation
When the agent uses incorrect tools or misses required ones:summary.json
Key Features
- Tool Selection: Verify agents use the correct tools (
contains,not_contains) - Tool Input: View and validate tool input parameters in
actual_steps - Sequence Validation: Check tools are used in the expected order (
equals,contains_ordered) - Semantic Validation: Use LLM to validate tool usage appropriateness (
semantic)