Skip to main content
Memory compaction is in beta. The API may change in future releases.
As conversations grow longer, the accumulated message history can exceed a model’s context window. Memory compaction lets you define strategies that automatically trim or condense that history before each agent call, keeping token usage under control without changing how you call the agent.

How It Works

Compaction is triggered automatically when context-window utilization exceeds memory_compaction_ratio (default 0.75, i.e. 75%), at two points:
  • Turn start — before the first LLM call of a turn, utilization is estimated from the previous run’s token usage. The configured compactors are applied to the resolved memory before it reaches the LLM.
  • Mid-loop — between iterations within a turn (after each tool round), utilization is read from the previous LLM call’s reported usage. This bounds a single turn that makes many or large tool calls instead of letting it grow until it overflows. Mid-loop compaction always protects the most recent, still-unconsumed assistant tool batch (the results the next model call must read), so it never sends the agent back to re-plan the same step.
Both use the same strategies and the same ratio; you don’t configure them separately.
from timbal import Agent
from timbal.core.memory_compaction import keep_last_n_turns

agent = Agent(
    name="my_agent",
    model="openai/gpt-4o-mini",
    memory_compaction=keep_last_n_turns(10),
    memory_compaction_ratio=0.75,  # trigger when previous run used >75% of context
)
Set memory_compaction_ratio=0.0 to always compact, or 1.0 to effectively disable auto-triggering.

Built-in Strategies

Import strategies from timbal.core.memory_compaction.

keep_last_n_turns(n)

Keeps only the last n user/assistant turn pairs. Structure-aware: never leaves orphaned tool calls or results.
from timbal.core.memory_compaction import keep_last_n_turns

agent = Agent(
    name="my_agent",
    model="openai/gpt-4o-mini",
    memory_compaction=keep_last_n_turns(5),
)

keep_last_n_messages(n)

Keeps only the last n messages regardless of role. Also structure-aware.
from timbal.core.memory_compaction import keep_last_n_messages

agent = Agent(
    name="my_agent",
    model="openai/gpt-4o-mini",
    memory_compaction=keep_last_n_messages(20),
)

compact_tool_results(...)

Reduces the size of tool call history. Useful when tools return large payloads that are no longer needed verbatim.
from timbal.core.memory_compaction import compact_tool_results

agent = Agent(
    name="my_agent",
    model="openai/gpt-4o-mini",
    memory_compaction=compact_tool_results(
        keep_last_n=2,       # keep last 2 tool call pairs intact
        threshold=10,        # only apply when memory exceeds 10 messages
        replacement="[Tool '{tool_name}' result truncated ({result_length} chars)]",
    ),
)
replacement controls what happens to compacted tool results:
  • None (default) — drop tool results and their corresponding tool_use entries entirely. Assistant messages that become empty are also dropped.
  • str — replace each result with a template string. Supported placeholders: {tool_name}, {call_id}, {result_length}.
  • callable(tool_name, call_id, result_text) -> str — call a function per result and use the return value as the replacement.
# Drop all tool results
compact_tool_results()

# Replace with a short summary string
compact_tool_results(replacement="[{tool_name}: {result_length} chars]")

# Custom replacement logic
def shorten(tool_name, call_id, result_text):
    return f"[{tool_name}]: {result_text[:100]}..."

compact_tool_results(replacement=shorten)

summarize(...)

Summarizes old messages into a single context message using an LLM call. Uses incremental summarization: on subsequent runs, only the new overflow messages are sent to the summarizer, which updates the existing summary rather than regenerating it from scratch. System messages are always preserved and never included in summarization.
from timbal.core.memory_compaction import summarize

agent = Agent(
    name="my_agent",
    model="openai/gpt-4o-mini",
    memory_compaction=summarize(
        threshold=20,                     # summarize when non-system messages exceed 20
        model="openai/gpt-4o-mini",       # model used for summarization (defaults to agent's model)
        keep_last_n=4,                    # keep last 4 messages unsummarized
        max_summary_tokens=500,
    ),
)
Use a smaller, cheaper model for summarization to reduce cost. For example, model="openai/gpt-5.4-nano" works well for summarization tasks.

Pinned results

Some tool results are durable context the model must keep referencing — for example loaded skill documentation. Those results are pinned: every compaction strategy preserves them verbatim (and never orphans their paired tool call), regardless of keep_last_n, turn windows, or drop/replacement mode. For keep_last_n_messages / keep_last_n_turns the effective behavior is “last N plus pinned”; for summarize, pinned results are kept verbatim and never fed to the summarizer (like system messages). You opt a tool in declaratively with pin_result=True:
from timbal.core.tool import Tool

tool = Tool(
    name="load_policy",
    handler=load_policy,
    pin_result=True,  # results survive compaction for the life of the conversation
)
The built-in read_skill tool sets this automatically, so skill guidance never gets compacted away. Pinning is durable across pause/resume — the flag is persisted with the trace.

Composing Strategies

Pass a list of compactors to apply them in order. Each compactor receives the output of the previous one.
from timbal.core.memory_compaction import compact_tool_results, keep_last_n_turns

agent = Agent(
    name="my_agent",
    model="openai/gpt-4o-mini",
    memory_compaction=[
        compact_tool_results(keep_last_n=2),  # first: shrink tool results
        keep_last_n_turns(10),                # then: trim to last 10 turns
    ],
)

Observability

When compaction fires, the agent span records a compaction key in its metadata:
{
  "triggered": true,
  "utilization": 0.91,
  "steps": [
    { "compactor": "compact_tool_results", "before": 42, "after": 28 },
    { "compactor": "keep_last_n_turns",    "before": 28, "after": 12 }
  ],
  "passes": 1
}
  • utilization — the context window fraction that triggered the most recent pass (null when memory_compaction_ratio=0.0).
  • steps — one entry per compactor with message counts before and after (for the most recent pass).
  • passes — how many times compaction fired on this span (turn start plus each mid-loop pass).
This data is visible in the Timbal platform trace viewer alongside the other span metadata.