Endpoint
POST /v1/audio/stream
Authentication
Bearer token required
Overview
The streaming endpoint converts text to speech and streams MP3 audio in real-time. This endpoint is ideal for applications requiring low-latency audio playback, such as real-time assistants or live caption-to-speech conversion.Response Format
audio/mpeg
(chunked MP3)First Byte Latency
< 500ms
Request Parameters
The text to convert to speech. Maximum 5,000 characters.
The synthesis model to use. Currently supported:
legacy-v2.5
The voice ID to use. Get available voices from the
voices endpoint.
Adjust the voice pitch. Range:
-100%
to +100%
.
Default: +0%
Emotional speaking style. Options:
neutral
, cheerful
,
calm
, angry
, sad
, excited
,
whispering
. Default: calm
Intensity of the selected style. Range:
0.5
to 2.0
.
Default: 1.5
BCP-47 language code (e.g.,
en-US
, fr-FR
). Default:
en-US
Examples
Response
The endpoint streams MP3 audio data with the following headers:audio/mpeg
chunked
Error Responses
Best Practices
- Connection Management: Use HTTP/2 or keep-alive connections to reduce latency
- Back-pressure: Process chunks as they arrive to maintain stream health
- Error Recovery: Implement reconnection logic for network interruptions
- Browser Support: Use MediaSource API for optimal browser streaming
- Security: Keep your API key secure and never expose it in client-side code
Streaming vs Standard Endpoint
Use Streaming When
- Real-time playback is needed - Low latency is critical - Processing long texts - Building conversational apps
Use Standard When
- Saving audio to files - Offline caching - Simple playback - Short text snippets
Authorizations
Your API key as a Bearer token
Body
application/json
Text to synthesize (up to 5,000 characters)
Example:
"Welcome to Suonora TTS!"
Name of the synthesis model
Example:
"legacy-v2.5"
Voice ID from the voice gallery
Example:
"axel"
Pitch adjustment from -100% to +100%
Example:
"+0%"
Emotional speaking style
Available options:
neutral
, cheerful
, calm
, angry
, sad
, excited
, whispering
Example:
"calm"
Intensity of the selected style
Required range:
0.5 <= x <= 2
Example:
1.5
BCP-47 language code
Example:
"en-US"
Response
Successful response
Streaming MP3 audio data