Skip to main content

Using Voice with Agents

Giving your Agent a Voice

Timbal agents can be enhanced with voice capabilities, enabling them to speak and listen. This example demonstrates how to configure voice functionality for your agents.

Prerequisites

This example uses the openai model. Make sure to add OPENAI_API_KEY to your .env file.

.env
OPENAI_API_KEY=your_api_key_here

Voice Tools

Create TTS and STT tools that the agent can use to process audio input and generate audio responses:

Speech-to-Text Tool

from timbal.core import Tool
from timbal.handlers.openai.stt import stt
# Create the STT tool
stt_tool_instance = Tool(
name="speech_to_text",
description="Convert audio input to text.",
handler=stt
)

Text-to-Speech Tool

from timbal.core import Tool
from timbal.handlers.openai.tts import tts
# Create the TTS tool
tts_tool_instance = Tool(
name="text_to_speech",
description="Convert text to speech.",
handler=tts
)

Voice-Enabled Agent

Create an agent with voice tools that can read audio files and respond with audio:

from timbal.core import Agent
voice_agent = Agent(
name="voice-agent",
description="An agent with voice capabilities for speaking and listening",
system_prompt="""You are a voice-enabled agent that MUST follow:
1. Convert the audio input to text using stt_tool_instance
2. Process the text and generate a response
3. Convert your response to speech using the tts_tool_instance
Always use the speech_to_text tool when you receive audio input, and use the tts_tool_instance tool to respond with audio.""",
model="openai/gpt-4.1",
tools=[stt_tool_instance, tts_tool_instance]
)

Example usage

This example shows how the agent can process audio input and respond with audio:

import asyncio
from timbal.types.file import File
async def main():
# Example: Agent receives audio input and responds with audio
# 1. Create a message with audio input
# Use a reliable sample audio file for testing
audio_file = File.validate("https://cdn.openai.com/API/docs/audio/alloy.wav")
prompt = [audio_file, "Please listen to this audio and respond with speech."]
# 2. Agent processes audio and responds with audio
response = await voice_agent(prompt=prompt).collect()
# 3. The response will contain audio content
print("Agent processed audio input and generated audio response!")
# 4. Save the audio response
if response.output.content:
for content in response.output.content:
if hasattr(content, 'file') and content.file:
output_path = "agent_response.mp3"
content.file.to_disk(output_path)
print(f"Audio response saved to: {output_path}")
if __name__ == "__main__":
asyncio.run(main())

Available voice handlers

Timbal provides several built-in voice handlers:

OpenAI Handlers

ElevenLabs Handler

Configuration

Make sure you have the required API keys set in your environment:

  • OPENAI_API_KEY for OpenAI voice services
  • ELEVENLABS_API_KEY for ElevenLabs services

The agent now has voice tools as part of its capabilities, making it truly voice-enabled!