params.messages to establish conversation history and validate the agent’s response to the final message.
Example
This example demonstrates how to test multi-turn conversations by passing a full conversation history viaparams.messages and validating that the agent uses context appropriately.
Eval Configuration
evals.yaml
Agent Implementation
agent.py
Running Evaluations
How It Works
-
Conversation History: The
params.messagesarray establishes the full conversation history, including previous user messages and assistant responses. - Context Usage: The agent receives the entire conversation history, so it can remember what was said in previous turns and answer accordingly.
-
Output Validation: The
outputvalidator checks that the agent’s response contains the expected content (e.g., “yes” for the umbrella question). -
Sequence Validation: The
seq!validator ensures the agent only callsllmand doesn’t unnecessarily callget_weatheragain, since the weather information is already available in the conversation history.
Evaluation Results
Successful Validation
When the agent remembers context and provides the correct response:Failed Validation
When the agent doesn’t use context or provides an incorrect response:Key Features
- Conversation History: Use
params.messagesto establish full conversation context - Context Memory: Agents receive the entire conversation history and can remember previous interactions
- Sequence Validation: Use
seq!to verify agents don’t call tools unnecessarily when context is available - Response Continuity: Ensure agents build logically on previous interactions