Skip to main content
Output validation ensures your agent produces correct, well-formatted responses. You can validate content structure, exclude unwanted text, match patterns with regex, and use LLM-powered semantic evaluation.

Creative Writing Agent

agent.py
from timbal import Agent

agent = Agent(
    name="creative_writer",
    model="openai/gpt-4.1-mini",
    system_prompt="""You are a creative writing assistant.
For any story request, always provide:
1. A compelling title
2. A complete short story (2 sentences)
3. A moral or lesson

Format your response as:
Title: [story title]
Story: [complete narrative]
Lesson: [moral or takeaway]"""
)

Running Evaluations

python -m timbal.eval --fqn agent.py::agent --tests evals.yaml

Validation Configuration

evals.yaml
- name: eval_creative_writer_response
  description: Validate creative writing agent provides well-structured stories
  turns:
    - input: "Write a story about a robot learning to paint"
      output:
        content:
          validators:
            contains:
              - "Title"
              - "Story"
              - "Lesson"
            not_contains:
              - "error"
              - "failed"
            regex: "^Title: .+"
        validators:
          time:
            max: 15.0
          usage:
            "gpt-4.1-mini:output_text_tokens":
              max: 500

Evaluation Results

Successful Validation

When all validators pass:
summary.json
{
  "total_files": 1,
  "total_tests": 1,
  "total_validations": 5,
  "inputs_passed": 0,
  "inputs_failed": 0,
  "outputs_passed": 1,
  "outputs_failed": 0,
  "steps_passed": 0,
  "steps_failed": 0,
  "execution_errors": 0,
  "tests_failed": []
}

Failed Validation

When any validator fails:
summary.json
{
  "total_files": 1,
  "total_tests": 1,
  "total_validations": 5,
  "inputs_passed": 0,
  "inputs_failed": 0,
  "outputs_passed": 0,
  "outputs_failed": 1,
  "steps_passed": 0,
  "steps_failed": 0,
  "execution_errors": 0,
  "tests_failed": [
    {
      "test_name": "eval_creative_writer_response",
      "test_path": "evals.yaml::eval_creative_writer_response",
      "input": {
        "prompt": [
          "Write a story about a robot learning to paint"
        ]
      },
      "reason": [
        "output"
      ],
      "execution_error": null,
      "input_passed": null,
      "input_explanations": [],
      "output_passed": false,
      "output_explanations": [
        "Validator contains: Message does not contain 'Story'.",
        "Validator not_contains: Message contains 'error'.",
        "Validator regex: Message does not match regex '^Title: .+'."
      ],
      "actual_output": {
        "text": "**Title:** The Robot’s First Brushstroke\n\nIn a quiet studio, a robot carefully dipped its synthetic fingers into vibrant paint, surprising everyone when it created a masterpiece filled with emotion and color despite never feeling a single feeling itself. Through trial and error, it learned that art wasn’t just about precision but about expressing something beyond programming.\n\n**Moral:** Creativity springs not only from emotions but from the courage to explore and grow beyond one’s limits.",
        "files": []
      },
      "expected_output": {
        "time": {
          "validators": {
            "time": {
              "max": 15.0
            }
          }
        },
        "usage": {
          "validators": {
            "usage": {
              "gpt-4.1-mini:output_text_tokens": {
                "max": 500
              }
            }
          }
        },
        "content": {
          "validators": {
            "contains": [
              "Title",
              "Story",
              "Lesson"
            ],
            "not_contains": [
              "error",
              "failed"
            ],
            "regex": "^Title: .+"
          }
        }
      },
      "steps_passed": null,
      "steps_explanations": [],
      "actual_steps": [],
      "expected_steps": null
    }
  ]
}

Key Features

  • Content Validation: Verify required keywords (contains) and exclude unwanted content (not_contains)
  • Format Validation: Ensure responses follow expected structure with regex patterns
  • Time Validation: Monitor execution time with top-level time validator
  • Usage Validation: Track resource consumption with top-level usage validator
  • Combined Validators: Use per-key validators (content.validators) and top-level validators (output.validators) together