Stream Speech

Text-to-Speech Streaming API

Welcome to Suonora’s low-latency streaming endpoint! /v1/audio/stream begins sending audio frames as soon as synthesis starts, so end-users hear speech almost immediately—even for long passages. Everything else (authentication, parameters, limits) matches the “normal” /v1/audio/speech endpoint.


Overview

Suonora TTS Streaming is ideal for:

  • Real-time assistants & IVR – start speaking back while text is still arriving.
  • Live caption → speech – minimal delay from caption feed to voiced output.
  • Large documents – no need to wait for the full MP3 to finish rendering.

What’s different?

/v1/audio/speech (normal)/v1/audio/stream (this doc)
Response headersContent-Type: audio/mpeg
Content-Length: <size>
Content-Type: audio/mpeg
Transfer-Encoding: chunked
First byte arrivesAfter synthesis completesWithin a few hundred ms
Best forShort prompts, offline cachingConversational & real-time apps

All request parameters, limits, and pricing tiers are identical.


Authentication

Use the same Bearer-token mechanism described in the main documentation:

-H "Authorization: Bearer YOUR_API_KEY"

HTTPS is mandatory.


Quick-start (cURL)

sh
curl -X POST https://api.suonora.com/v1/audio/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output - \
  -d '{"input":"Streaming is fast!","model":"legacy-v2.5","voice":"axel"}' \
  | ffplay -autoexit -nodisp -

ffplay (from FFmpeg) consumes the chunked MP3 and plays it live.


API Reference

Generate Streaming Speech

POST /v1/audio/stream

HTTP Request

POST https://api.suonora.com/v1/audio/stream

Headers

HeaderValue
AuthorizationBearer YOUR_API_KEY
Content-Typeapplication/json

Request Body

Exactly the same schema as /v1/audio/speech:

ParameterTypeRequiredDefaultDescription
inputstringYesText to synthesize (≤ 5 000 chars).
modelstringYesSynthesis model name.
voicestringYesVoice ID.
pitchstringNo+0%Range -100%+100%.
stylestringNocalmEmotional speaking style.
styleDegreenumberNo1.5Intensity 0.52.0.
langstringNoen-USBCP-47 language tag.

Response

HeaderValue
Content-Typeaudio/mpeg
Transfer-Encodingchunked

Body: A chunked MP3 stream. Playback can begin as soon as the first chunk arrives.


Usage Examples

Node.js (stream directly to a file or speaker)

javascript
import fetch from "node-fetch";
import { createWriteStream } from "node:fs";
import { pipeline } from "node:stream/promises";
 
async function streamTTS() {
  const response = await fetch("https://api.suonora.com/v1/audio/stream", {
    method: "POST",
    headers: {
      Authorization: "Bearer YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      input: "Hello, this is Suonora streaming in real time!",
      model: "legacy-v2.5",
      voice: "axel",
    }),
  });
 
  if (!response.ok) throw new Error(`HTTP ${response.status}`);
 
  // Pipe the readable stream straight to a file (or Speaker / prism-media)
  await pipeline(response.body, createWriteStream("realtime.mp3"));
  console.log("Saved realtime.mp3");
}
 
streamTTS().catch(console.error);

Swap createWriteStream for a library like speaker or prism-media to play on-the-fly.


Browser Example (HTML + JS)

html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>Suonora TTS Streaming Demo</title>
</head>
<body>
  <button id="play">Stream & Play</button>
  <audio id="player" controls></audio>
 
  <script>
    document.getElementById("play").onclick = async () => {
      const res = await fetch("https://api.suonora.com/v1/audio/stream", {
        method: "POST",
        headers: {
          Authorization: "Bearer YOUR_API_KEY",
          "Content-Type": "application/json",
        },
        body: JSON.stringify({
          input: "Streaming from Suonora with minimal latency.",
          model: "legacy-v2.5",
          voice: "axel"
        }),
      });
 
      if (!res.ok) return alert("TTS request failed: " + res.status);
 
      // Convert ReadableStream -> MediaSource for immediate playback
      const mediaSource = new MediaSource();
      const audio = document.getElementById("player");
      audio.src = URL.createObjectURL(mediaSource);
      audio.play();
 
      mediaSource.addEventListener("sourceopen", async () => {
        const mime = 'audio/mpeg';
        const sourceBuffer = mediaSource.addSourceBuffer(mime);
 
        const reader = res.body.getReader();
        let done, value;
        while ({ done, value } = await reader.read(), !done) {
          sourceBuffer.appendBuffer(value);
          await new Promise(r => sourceBuffer.updating ? sourceBuffer.addEventListener("update", r, { once: true }) : r());
        }
        mediaSource.endOfStream();
      });
    };
  </script>
</body>
</html>

The MediaSource API feeds arriving chunks straight into the <audio> tag, so playback begins almost instantly.


Error Handling

Identical status codes and meanings as the normal endpoint (400, 401, 429, 500). For transient network issues during a long stream, reconnect and resend the request—Suonora’s engine will resume where possible.


Best Practices (Streaming)

  • Keep connections alive. HTTP/2 or keep-alive sockets reduce handshake latency.
  • Back-pressure. Consume data continuously; otherwise the server may slow down or close the stream.
  • Progressive UI. Start playing audio as soon as the first chunks arrive for the snappiest UX.
  • Reuse logic. The same caching, retry, and chunk-splitting advice from the normal API applies here.

Changelog

  • v1.3.0May 2025 New endpoint /v1/audio/stream – real-time chunked MP3 streaming.

Enjoy ultra-low-latency speech synthesis with Suonora’s Streaming API!