parent_id (the paused run id) and resume={...}. In-process this works out of the box; across processes you need a durable provider; and you can enumerate what’s pending either way.
Cancelling instead of answering
A human won’t always say yes or no. Sometimes they close the dialog, hit Escape, or abandon the task. That’s a cancel, and it’s different from a deny:- Deny / decline continues the run. An approval
approved=Falseis fed back to the model as a tool result so the agent can apologize or try another path; asuspend()decline is just a value your handler interprets. - Cancel aborts the entire run. Nothing is fed back to the model.
Cancel (works on either an approval or a suspension id) to abort:
Cancel keyed to any pending id in a batch tears down the whole run.
Across processes (durable providers)
Everything above works in-process by default. For “pause in a UI now, resume in a worker later” (the real human-in-the-loop shape) switch off the defaultInMemoryTracingProvider (which only resumes within the same Python process) to a durable provider. This applies identically to approval gates and suspend().
- JSONL (local / single host)
- SQLite (single host, higher throughput)
- Timbal Platform (multi-host)
JsonlTracingProvider writes one record per run and uses a sidecar lock file (traces.jsonl.approval_claims.json + .lock via fcntl) for cross-process approval claims. Good for local dev and single-host deployments. Not recommended for high-throughput production: _store() rewrites the file on each run.parent_id:
Memory compaction on resume
If the agent hasmemory_compaction configured, a resume turn is treated like a continuation of the paused turn, not a fresh one. The loaded memory ends with the gated/suspended tool_use that has no tool_result yet — that trailing block is exactly what Timbal re-executes to settle the pause without re-calling the model.
Compaction is structure-aware about this:
- The pending
tool_useis never compacted away. (A naive pass would treat it as an orphaned tool call, strip it, and force the model to re-plan — which is nondeterministic and would silently drop the human’s decision.) - The history before the pending call is still compacted, so resuming a long paused thread doesn’t carry the full uncompacted transcript into the continuation LLM call and overflow the context window.
compact_tool_results, keep_last_n_messages, keep_last_n_turns, summarize) and any custom compactor. You don’t configure anything extra — it just works on the same resume= call.
Compaction runs at turn boundaries and between iterations within a turn (mid-loop), so a single turn that makes many or large tool calls is compacted as it grows once utilization crosses
memory_compaction_ratio. Mid-loop compaction always protects the most recent (unconsumed) assistant tool batch — the results the next model call must read — so it bounds context without sending the agent back to re-plan the same step.In a Workflow, compaction is a per-step concern: a Workflow has no LLM context of its own, and an Agent used as a step runs with isolated context (no cross-turn memory) but still mid-loop-compacts a long tool loop inside that step. Step outputs passed between steps are not “context” and are never compacted.Duplicate worker protection
When multiple workers consume the same queue, two of them might race to resume the same(parent_id, approval_id). Timbal claims the pair before executing the resolution. The first claimer wins; later duplicates stop before handler execution with status.reason == "approval_already_claimed".
JsonlTracingProvider and SqliteTracingProvider out of the box. Custom providers must override claim_approval(parent_id, approval_id, run_id) to get the same durable-lock behavior; the base class default is a no-op.
Enumerating what’s pending
When a run cancels and the runnable had multiple concurrent calls, each pause emits its own event. There are two ergonomic ways to enumerate them, and the same shape applies to both approvals and interactions. During the stream, capture every event:.collect(), read the lists the collector attaches to OutputEvent.metadata (the status only references the first pause):
RunContext.pending_approvals() and RunContext.pending_interactions() walk RunContext._trace directly. They tolerate both live RunStatus and dict-after-reload shapes, so they work against in-memory, JSONL, SQLite, and platform traces. Approval entries use the redacted input snapshot, never the raw secrets.
metadata["pending_*"] is added by .collect(). Over the HTTP server (which streams raw events) the frontend reads the APPROVAL / INTERACTION events directly instead. See Client integration (HTTP).See also
- Approval gates — declarative gates for irreversible actions
- Suspend & interaction tools — ask the user for arbitrary input
- Client integration (HTTP) — the
/streamwire contract