TTS Hub - FYI.AI

Text

Quick Tags

Voice

Predefined Voice

Generation Parameters

Exaggeration 0.5

Temperature 0.8

CFG Weight 0.5

Speed 1.0

Seed (-1 = random)

Output Mode Stream

Generating

Emotion tags work best WITHOUT voice cloning. Format: (emotion)Text with NO space after the closing parenthesis.

Text

Emotion Tags

Reference Audio (voice cloning - optional)

Click or drag audio file here

Reference Text (transcription of reference audio)

Generating

Qwen3-TTS supports instruct mode for fine-grained voice control (emotion, speed, pitch, volume) and voice cloning from a reference audio file.

Text

Instruct Tags

Voice Mode

Speaker

Instruct (voice style description)

Language

Generating

VoxCPM2 — 2B params, MiniCPM-4 backbone, Apache 2.0.
30 languages auto-detected from input text, no language tag required · 48 kHz studio output via AudioVAE V2 (built-in super-resolution from 16 kHz refs) · Context-aware prosody inferred from the text itself · Real-time streaming (RTF ~ 0.13 — toggle below)
Three modes: Voice Design (text-only, describe a voice) · Controllable Cloning (reference + style instruction, timbre preserved) · Ultimate Cloning (audio continuation with transcript, max fidelity).

Mode

Target Text

Control Instruction (English / 中文)

Voice Design Presets

Languages (30 supported)

Language hint (model auto-detects from text)

Real-time streaming (chunked WAV, lower TTFB)

Generating

Mode

Target Text

Preset Voice

Real-time streaming (chunked PCM, ~100 ms TTFB)

Generating

Chatterbox Turbo

TTS Hub

Ready