Adding Voice

Timbal agents can understand and generate speech using a variety of voice providers. This enables use cases like voice assistants, audio chatbots, and speech-to-speech interactions.

Using a Single Provider

You can add voice capabilities to your agent using a single provider, such as OpenAI or ElevenLabs. Speech-to-Text (STT): Convert audio to text.

from timbal.handlers.elevenlabs import stt
from timbal.types import File

audio_file = File.validate("path/to/audio.wav")
transcription = await stt(audio_file=audio_file)
print(transcription)

Text-to-Speech (TTS): Convert text to audio.

from timbal.handlers.elevenlabs import tts

audio_file = await tts(
    text="Hello, how are you?",
    voice_id="your-voice-id"
)
# audio_file is a File object containing the generated audio

Using Multiple Providers

You can mix and match providers for STT and TTS. For example, use OpenAI for transcription and ElevenLabs for speech generation.

from timbal.handlers.openai import stt as openai_stt
from timbal.handlers.elevenlabs import tts as elevenlabs_tts
from timbal.types import File

# Transcribe with OpenAI
audio_file = File.validate("https://cdn.openai.com/API/docs/audio/alloy.wav")
audio_text = await openai_stt(audio_file=audio_file)

# Synthesize with ElevenLabs
audio_response = await elevenlabs_tts(
        text=audio_text, 
        voice_id="21m00Tcm4TlvDq8ikWAM"
)

Speech-to-Speech Voice Interactions

You can build agents that both understand and respond in audio. For example, an agent that receives audio, transcribes it, generates a response, and then synthesizes speech:

from timbal import Agent
from timbal.handlers.elevenlabs import stt, tts

agent = Agent(
    name="agent",
    model="openai/gpt-4.1-mini",
    tools=[tts],
    system_prompt=(
        "You are a helpful assistant and you must always respond in audio format. "
        "Always use '56AoDkrOh6qfVPDXZ7Pt' as the voice_id for the TTS model."
    )
)

response = await agent(prompt="How are you?").collect()
# response.output will be a File (audio) if TTS is used

Supported Voice Providers

Timbal supports multiple providers for both STT and TTS:

OpenAI: High-quality transcription and speech synthesis.
ElevenLabs: Advanced, natural-sounding voices and robust transcription.
(More providers coming soon!)

For more details, see the ElevenLabs Integration and [OpenAI Integration] pages.

Getting started

Core Concepts

Agents

Workflows

Knowledge Bases

Deployment

AI Tools

Using a Single Provider

Using Multiple Providers

Speech-to-Speech Voice Interactions

Supported Voice Providers

Getting started

Core Concepts

Agents

Workflows

Knowledge Bases

Deployment

AI Tools

​Using a Single Provider

​Using Multiple Providers

​Speech-to-Speech Voice Interactions

​Supported Voice Providers

Using a Single Provider

Using Multiple Providers

Speech-to-Speech Voice Interactions

Supported Voice Providers