Text-to-Speech Streaming API
Welcome to Suonora’s low-latency streaming endpoint!
/v1/audio/stream
begins sending audio frames as soon as synthesis starts, so end-users hear speech almost immediately—even for long passages. Everything else (authentication, parameters, limits) matches the “normal” /v1/audio/speech endpoint.
Overview
Suonora TTS Streaming is ideal for:
- Real-time assistants & IVR – start speaking back while text is still arriving.
- Live caption → speech – minimal delay from caption feed to voiced output.
- Large documents – no need to wait for the full MP3 to finish rendering.
What’s different?
/v1/audio/speech (normal) | /v1/audio/stream (this doc) | |
---|---|---|
Response headers | Content-Type: audio/mpeg Content-Length: <size> | Content-Type: audio/mpeg Transfer-Encoding: chunked |
First byte arrives | After synthesis completes | Within a few hundred ms |
Best for | Short prompts, offline caching | Conversational & real-time apps |
All request parameters, limits, and pricing tiers are identical.
Authentication
Use the same Bearer-token mechanism described in the main documentation:
-H "Authorization: Bearer YOUR_API_KEY"
HTTPS is mandatory.
Quick-start (cURL)
curl -X POST https://api.suonora.com/v1/audio/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
--output - \
-d '{"input":"Streaming is fast!","model":"legacy-v2.5","voice":"axel"}' \
| ffplay -autoexit -nodisp -
ffplay
(from FFmpeg) consumes the chunked MP3 and plays it live.
API Reference
Generate Streaming Speech
POST /v1/audio/stream
HTTP Request
POST https://api.suonora.com/v1/audio/stream
Headers
Header | Value |
---|---|
Authorization | Bearer YOUR_API_KEY |
Content-Type | application/json |
Request Body
Exactly the same schema as /v1/audio/speech
:
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
input | string | Yes | — | Text to synthesize (≤ 5 000 chars). |
model | string | Yes | — | Synthesis model name. |
voice | string | Yes | — | Voice ID. |
pitch | string | No | +0% | Range -100% … +100% . |
style | string | No | calm | Emotional speaking style. |
styleDegree | number | No | 1.5 | Intensity 0.5 … 2.0 . |
lang | string | No | en-US | BCP-47 language tag. |
Response
Header | Value |
---|---|
Content-Type | audio/mpeg |
Transfer-Encoding | chunked |
Body: A chunked MP3 stream. Playback can begin as soon as the first chunk arrives.
Usage Examples
Node.js (stream directly to a file or speaker)
import fetch from "node-fetch";
import { createWriteStream } from "node:fs";
import { pipeline } from "node:stream/promises";
async function streamTTS() {
const response = await fetch("https://api.suonora.com/v1/audio/stream", {
method: "POST",
headers: {
Authorization: "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
input: "Hello, this is Suonora streaming in real time!",
model: "legacy-v2.5",
voice: "axel",
}),
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
// Pipe the readable stream straight to a file (or Speaker / prism-media)
await pipeline(response.body, createWriteStream("realtime.mp3"));
console.log("Saved realtime.mp3");
}
streamTTS().catch(console.error);
Swap createWriteStream
for a library like speaker
or prism-media
to play on-the-fly.
Browser Example (HTML + JS)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Suonora TTS Streaming Demo</title>
</head>
<body>
<button id="play">Stream & Play</button>
<audio id="player" controls></audio>
<script>
document.getElementById("play").onclick = async () => {
const res = await fetch("https://api.suonora.com/v1/audio/stream", {
method: "POST",
headers: {
Authorization: "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
input: "Streaming from Suonora with minimal latency.",
model: "legacy-v2.5",
voice: "axel"
}),
});
if (!res.ok) return alert("TTS request failed: " + res.status);
// Convert ReadableStream -> MediaSource for immediate playback
const mediaSource = new MediaSource();
const audio = document.getElementById("player");
audio.src = URL.createObjectURL(mediaSource);
audio.play();
mediaSource.addEventListener("sourceopen", async () => {
const mime = 'audio/mpeg';
const sourceBuffer = mediaSource.addSourceBuffer(mime);
const reader = res.body.getReader();
let done, value;
while ({ done, value } = await reader.read(), !done) {
sourceBuffer.appendBuffer(value);
await new Promise(r => sourceBuffer.updating ? sourceBuffer.addEventListener("update", r, { once: true }) : r());
}
mediaSource.endOfStream();
});
};
</script>
</body>
</html>
The MediaSource API feeds arriving chunks straight into the <audio>
tag, so playback begins almost instantly.
Error Handling
Identical status codes and meanings as the normal endpoint (400
, 401
, 429
, 500
).
For transient network issues during a long stream, reconnect and resend the request—Suonora’s engine will resume where possible.
Best Practices (Streaming)
- Keep connections alive. HTTP/2 or keep-alive sockets reduce handshake latency.
- Back-pressure. Consume data continuously; otherwise the server may slow down or close the stream.
- Progressive UI. Start playing audio as soon as the first chunks arrive for the snappiest UX.
- Reuse logic. The same caching, retry, and chunk-splitting advice from the normal API applies here.
Changelog
- v1.3.0 – May 2025 New endpoint /v1/audio/stream – real-time chunked MP3 streaming.
Enjoy ultra-low-latency speech synthesis with Suonora’s Streaming API!