Skip to main content

What is Thinking?

Thinking (also called reasoning) is an advanced capability that allows AI models to engage in extended reasoning processes before generating their final response. Models with thinking capabilities can work through complex problems step-by-step, showing their reasoning process, which leads to more accurate and well-thought-out answers.
Not all models support thinking capabilities. See the Model Capabilities page to check which models have thinking support.

Using Thinking

Thinking is configured via model_params when creating an agent. The parameter key differs by provider.

Anthropic Models

For Anthropic models, pass a thinking dict with type and budget_tokens inside model_params. max_tokens is a required top-level field:
agent = Agent(
    name="reasoning_agent",
    model="anthropic/claude-sonnet-4-5",
    max_tokens=20000,
    model_params={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10000  # Must be >= 1024 and < max_tokens
        }
    }
)
Parameters:
  • type: Must be "enabled" to activate thinking
  • budget_tokens: The maximum number of tokens the model can use for reasoning (must be >= 1024 and < max_tokens)
Important: When using thinking with Anthropic models, budget_tokens must be less than max_tokens because thinking tokens count toward the total token budget. The max_tokens parameter sets the total output limit, while budget_tokens allocates a portion of that budget specifically for reasoning. Make sure to set max_tokens high enough to accommodate both thinking and the final response.

OpenAI Models

For OpenAI models (both o-series reasoning models and GPT models that support thinking), pass a reasoning dict inside model_params:
agent = Agent(
    name="reasoning_agent",
    model="openai/gpt-5",
    model_params={
        "reasoning": {
            "effort": "high",   # Options: "none", "low", "medium", "high", "xhigh"
            "summary": "auto"   # Options: "auto", "concise", "detailed"
        }
    }
)
Parameters:
  • effort: Controls how much computational effort the model puts into reasoning
    • "none" - No reasoning (default for non-reasoning-focused calls)
    • "low" - Light reasoning
    • "medium" - Substantial reasoning
    • "high" - Deep reasoning
    • "xhigh" - Maximum reasoning effort
  • summary: Controls whether a reasoning summary is streamed alongside the response
    • "auto" - Automatic summary
    • "concise" - Brief summary
    • "detailed" - Comprehensive summary

Overriding Thinking Per Request

You can override the thinking configuration for individual requests by passing provider_params at call time. This takes priority over the agent’s default model_params.
# Agent with no thinking by default
agent = Agent(
    name="my_agent",
    model="anthropic/claude-sonnet-4-5",
    max_tokens=20000,
)

# Enable thinking for this specific request (Anthropic)
result = await agent(
    prompt="Solve this complex problem",
    provider_params={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 5000
        }
    }
).collect()
# Agent with no thinking by default
agent = Agent(
    name="my_agent",
    model="openai/o3-mini",
)

# Enable thinking for this specific request (OpenAI)
result = await agent(
    prompt="Solve this complex problem",
    provider_params={
        "reasoning": {
            "effort": "high",
            "summary": "auto"
        }
    }
).collect()
provider_params passed at call time completely replaces the agent’s default model_params for that request. If your agent has other provider-specific params set via model_params, make sure to include them in the per-call provider_params as well if you still need them.

When to Use Thinking

Thinking is particularly useful for:
  • Complex problem-solving - Multi-step reasoning tasks
  • Mathematical problems - Calculations requiring step-by-step work
  • Code analysis - Understanding and debugging complex code
  • Strategic planning - Long-term thinking and planning
  • Scientific reasoning - Hypothesis testing and analysis
Thinking modes consume more tokens and may increase response time, but often produce more accurate and well-reasoned responses for complex tasks.