Weather Assistant Agent
agent.py
Running Evaluations
Validation Configuration
evals.yaml
How It Works
- First Turn (Context): The first turn has no validators. It establishes conversation history by providing the user input and expected output. The agent receives this information but it’s not validated.
-
Second Turn (Validation): The last turn has validators for
stepsandoutput. - Memory: The agent receives the entire conversation history, so it can remember what was said in previous turns and answer accordingly.
Evaluation Results
Successful Validation
When the agent remembers context and provides the correct response:summary.json
Failed Validation
When the agent doesn’t use context or provides an incorrect response:summary.json
Key Features
- Context Memory: Previous turns establish conversation history that agents can access
- Last Turn Validation: Only the final turn is validated; earlier turns provide context
- Tool Usage Control: Verify agents don’t call tools unnecessarily when context is available
- Response Continuity: Ensure agents build logically on previous interactions