Text-to-Speech Streaming API

Welcome to Suonora’s low-latency streaming endpoint! /v1/audio/stream begins sending audio frames as soon as synthesis starts, so end-users hear speech almost immediately—even for long passages. Everything else (authentication, parameters, limits) matches the “normal” /v1/audio/speech endpoint.

Overview

Suonora TTS Streaming is ideal for:

Real-time assistants & IVR – start speaking back while text is still arriving.
Live caption → speech – minimal delay from caption feed to voiced output.
Large documents – no need to wait for the full MP3 to finish rendering.

What’s different?

	`/v1/audio/speech` (normal)	`/v1/audio/stream` (this doc)
Response headers	`Content-Type: audio/mpeg` `Content-Length: <size>`	`Content-Type: audio/mpeg` `Transfer-Encoding: chunked`
First byte arrives	After synthesis completes	Within a few hundred ms
Best for	Short prompts, offline caching	Conversational & real-time apps

All request parameters, limits, and pricing tiers are identical.

Authentication

Use the same Bearer-token mechanism described in the main documentation:

-H "Authorization: Bearer YOUR_API_KEY"

HTTPS is mandatory.

Quick-start (cURL)

curl -X POST https://api.suonora.com/v1/audio/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output - \
  -d '{"input":"Streaming is fast!","model":"legacy-v2.5","voice":"axel"}' \
  | ffplay -autoexit -nodisp -

ffplay (from FFmpeg) consumes the chunked MP3 and plays it live.

API Reference

Generate Streaming Speech

POST /v1/audio/stream

HTTP Request

POST https://api.suonora.com/v1/audio/stream

Headers

Header	Value
Authorization	`Bearer YOUR_API_KEY`
Content-Type	`application/json`

Request Body

Exactly the same schema as /v1/audio/speech:

Parameter	Type	Required	Default	Description
`input`	string	Yes	—	Text to synthesize (≤ 5 000 chars).
`model`	string	Yes	—	Synthesis model name.
`voice`	string	Yes	—	Voice ID.
`pitch`	string	No	`+0%`	Range `-100%` … `+100%`.
`style`	string	No	`calm`	Emotional speaking style.
`styleDegree`	number	No	`1.5`	Intensity `0.5` … `2.0`.
`lang`	string	No	`en-US`	BCP-47 language tag.

Response

Header	Value
Content-Type	`audio/mpeg`
Transfer-Encoding	`chunked`

Body: A chunked MP3 stream. Playback can begin as soon as the first chunk arrives.

Usage Examples

Node.js (stream directly to a file or speaker)

javascript

import fetch from "node-fetch";
import { createWriteStream } from "node:fs";
import { pipeline } from "node:stream/promises";
 
async function streamTTS() {
  const response = await fetch("https://api.suonora.com/v1/audio/stream", {
    method: "POST",
    headers: {
      Authorization: "Bearer YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      input: "Hello, this is Suonora streaming in real time!",
      model: "legacy-v2.5",
      voice: "axel",
    }),
  });
 
  if (!response.ok) throw new Error(`HTTP ${response.status}`);
 
  // Pipe the readable stream straight to a file (or Speaker / prism-media)
  await pipeline(response.body, createWriteStream("realtime.mp3"));
  console.log("Saved realtime.mp3");
}
 
streamTTS().catch(console.error);

Swap createWriteStream for a library like speaker or prism-media to play on-the-fly.

Browser Example (HTML + JS)

html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>Suonora TTS Streaming Demo</title>
</head>
<body>
  <button id="play">Stream & Play</button>
  <audio id="player" controls></audio>
 
  <script>
    document.getElementById("play").onclick = async () => {
      const res = await fetch("https://api.suonora.com/v1/audio/stream", {
        method: "POST",
        headers: {
          Authorization: "Bearer YOUR_API_KEY",
          "Content-Type": "application/json",
        },
        body: JSON.stringify({
          input: "Streaming from Suonora with minimal latency.",
          model: "legacy-v2.5",
          voice: "axel"
        }),
      });
 
      if (!res.ok) return alert("TTS request failed: " + res.status);
 
      // Convert ReadableStream -> MediaSource for immediate playback
      const mediaSource = new MediaSource();
      const audio = document.getElementById("player");
      audio.src = URL.createObjectURL(mediaSource);
      audio.play();
 
      mediaSource.addEventListener("sourceopen", async () => {
        const mime = 'audio/mpeg';
        const sourceBuffer = mediaSource.addSourceBuffer(mime);
 
        const reader = res.body.getReader();
        let done, value;
        while ({ done, value } = await reader.read(), !done) {
          sourceBuffer.appendBuffer(value);
          await new Promise(r => sourceBuffer.updating ? sourceBuffer.addEventListener("update", r, { once: true }) : r());
        }
        mediaSource.endOfStream();
      });
    };
  </script>
</body>
</html>

The MediaSource API feeds arriving chunks straight into the <audio> tag, so playback begins almost instantly.

Error Handling

Identical status codes and meanings as the normal endpoint (400, 401, 429, 500). For transient network issues during a long stream, reconnect and resend the request—Suonora’s engine will resume where possible.

Best Practices (Streaming)

Keep connections alive. HTTP/2 or keep-alive sockets reduce handshake latency.
Back-pressure. Consume data continuously; otherwise the server may slow down or close the stream.
Progressive UI. Start playing audio as soon as the first chunks arrive for the snappiest UX.
Reuse logic. The same caching, retry, and chunk-splitting advice from the normal API applies here.

Changelog

v1.3.0 – May 2025 New endpoint /v1/audio/stream – real-time chunked MP3 streaming.

Enjoy ultra-low-latency speech synthesis with Suonora’s Streaming API!

Create Speech Voices Endpoint