Using Voice with Agents

Giving your Agent a Voice

Timbal agents can be enhanced with voice capabilities, enabling them to speak and listen. This example demonstrates how to configure voice functionality for your agents.

Prerequisites

This example uses the openai model. Make sure to add OPENAI_API_KEY to your .env file.

.env

OPENAI_API_KEY=your_api_key_here

Voice Tools

Create TTS and STT tools that the agent can use to process audio input and generate audio responses:

Speech-to-Text Tool

from timbal.core import Tool
from timbal.handlers.openai.stt import stt

# Create the STT tool
stt_tool_instance = Tool(
  name="speech_to_text",
  description="Convert audio input to text.",
  handler=stt
)

Text-to-Speech Tool

from timbal.core import Tool
from timbal.handlers.openai.tts import tts

# Create the TTS tool
tts_tool_instance = Tool(
  name="text_to_speech",
  description="Convert text to speech.",
  handler=tts
)

Voice-Enabled Agent

Create an agent with voice tools that can read audio files and respond with audio:

from timbal.core import Agent

voice_agent = Agent(
  name="voice-agent",
  description="An agent with voice capabilities for speaking and listening",
  system_prompt="""You are a voice-enabled agent that MUST follow:
1. Convert the audio input to text using stt_tool_instance
2. Process the text and generate a response
3. Convert your response to speech using the tts_tool_instance

Always use the speech_to_text tool when you receive audio input, and use the tts_tool_instance tool to respond with audio.""",
  model="openai/gpt-4.1",
  tools=[stt_tool_instance, tts_tool_instance]
)

Example usage

This example shows how the agent can process audio input and respond with audio:

import asyncio
from timbal.types.file import File

async def main():
  # Example: Agent receives audio input and responds with audio
  
  # 1. Create a message with audio input
  # Use a reliable sample audio file for testing
  audio_file = File.validate("https://cdn.openai.com/API/docs/audio/alloy.wav")
  
  prompt = [audio_file, "Please listen to this audio and respond with speech."]

  # 2. Agent processes audio and responds with audio
  response = await voice_agent(prompt=prompt).collect()
  
  # 3. The response will contain audio content
  print("Agent processed audio input and generated audio response!")
  
  # 4. Save the audio response
  if response.output.content:
      for content in response.output.content:
          if hasattr(content, 'file') and content.file:
              output_path = "agent_response.mp3"
              content.file.to_disk(output_path)
              print(f"Audio response saved to: {output_path}")

if __name__ == "__main__":
  asyncio.run(main())

Available voice handlers

Timbal provides several built-in voice handlers:

OpenAI Handlers

STT: timbal.handlers.openai.stt.stt - OpenAI Integration
TTS: timbal.handlers.openai.tts.tts - OpenAI Integration

ElevenLabs Handler

STT: timbal.handlers.elevenlabs.stt.stt - ElevenLabs Integration
TTS: timbal.handlers.elevenlabs.tts.tts - ElevenLabs Integration

Configuration

Make sure you have the required API keys set in your environment:

OPENAI_API_KEY for OpenAI voice services
ELEVENLABS_API_KEY for ElevenLabs services

The agent now has voice tools as part of its capabilities, making it truly voice-enabled!

Using Voice↗

Prerequisites​

Voice Tools​

Speech-to-Text Tool​

Text-to-Speech Tool​

Voice-Enabled Agent​

Example usage​

Available voice handlers​

OpenAI Handlers​

ElevenLabs Handler​

Configuration​