Audio - Timbal

Transcribe audio files in a pre_hook before processing.

This example uses OpenAI’s transcription API, but you can use any transcription service (ElevenLabs, Google Cloud, or custom implementations).

The [Audio] prefix is added to the transcribed text to clearly indicate that the content originated from an audio file. This helps the agent understand the context and source of the information, which can be useful for:

Context Awareness: The agent knows the text came from audio transcription
Mixed Content: When combining text and audio in the same prompt, the prefix distinguishes transcribed content
Traceability: Makes it easier to track which parts of the conversation came from audio vs. text input

from timbal import Agent
from timbal.state import get_run_context
from timbal.types.file import File
from timbal.types.content import content_factory
import os
from openai import AsyncOpenAI

async def stt(audio_file: File) -> str:
    """Transcribe an audio file."""
    client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    transcript = await client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
    )
    return transcript.text

async def pre_hook():
    """Transcribe audio files before processing."""
    span = get_run_context().current_span()
    prompt = span.input.get("prompt")
    
    # Transcribe audio file and add prefix
    if (isinstance(prompt, File) and 
        prompt.__content_type__ and 
        prompt.__content_type__.startswith("audio/")):
        transcription = await stt(prompt)
        span.input["prompt"] = content_factory(f"[Audio]: {transcription}")

agent = Agent(
    name="AudioAgent",
    model="openai/gpt-4.1-mini",
    pre_hook=pre_hook
)

audio_file = File.validate("/path/to/recording.wav")
result = await agent(prompt=audio_file).collect()

This example uses OpenAI’s transcription API directly. For more advanced features like language detection, timestamps, and better error handling, refer to the OpenAI Audio API documentation.

Key Features

Pre-hook Transcription: Audio is transcribed before the agent processes it
Any Model: Works with any text model, not just audio-capable ones
Flexible Providers: Use any transcription service (OpenAI, ElevenLabs, or custom)
Audio Prefix: The “[Audio]” prefix clearly indicates transcribed content
File Support: Works with local files, URLs, and base64 data

Email Web Search

​Key Features

Key Features