agent.py
Running Evaluations
Multi-turn Memory Testing
evals.yaml
Example Evaluation Output
Successful Validation
summary.json
Failed Validation
summary.json
Key Features
- Memory Testing: Verify agents remember information from previous turns
- Context Continuity: Ensure responses build logically on previous interactions
- Task Progression: Test agents guide users through multi-step processes
- Conversation Flow: Validate natural, coherent dialogue progression