Voice: Speech Capabilities for Agents
Add speech-to-text and text-to-speech capabilities to your agents with multiple voice providers.
Timbal agents can understand and generate speech using a variety of voice providers. This enables use cases like voice assistants, audio chatbots, and speech-to-speech interactions.
Using a Single Provider
You can add voice capabilities to your agent using a single provider, such as OpenAI or ElevenLabs.
Speech-to-Text (STT): Convert audio to text.
Text-to-Speech (TTS): Convert text to audio.
Using Multiple Providers
You can mix and match providers for STT and TTS. For example, use OpenAI for transcription and ElevenLabs for speech generation.
Working with Audio Streams
You can pass audio files directly as prompts to agents. The agent will automatically use the appropriate STT tool if available.
Speech-to-Speech Voice Interactions
You can build agents that both understand and respond in audio. For example, an agent that receives audio, transcribes it, generates a response, and then synthesizes speech:
StartEvent(..., path='agent', ...)
StartEvent(..., path='agent.llm-0', ...)
OutputEvent(..., path='agent.llm-0', ...)
StartEvent(..., path='agent.tts-call_...', ...)
OutputEvent(..., path='agent.tts-call_...', ...)
OutputEvent(..., path='agent', ...)
Supported Voice Providers
Timbal supports multiple providers for both STT and TTS:
- OpenAI: High-quality transcription and speech synthesis.
- ElevenLabs: Advanced, natural-sounding voices and robust transcription.
- (More providers coming soon!)
For more details, see the ElevenLabs Integration and [OpenAI Integration] pages.