TTS HUB / FYI.AI
Chatterbox Fish Speech Qwen3 VoxCPM2 MOSS-Nano
Quick Tags

Generation Parameters

Generating
Emotion tags work best WITHOUT voice cloning. Format: (emotion)Text with NO space after the closing parenthesis.
Emotion Tags

Click or drag audio file here

Generating
Qwen3-TTS supports instruct mode for fine-grained voice control (emotion, speed, pitch, volume) and voice cloning from a reference audio file.
Instruct Tags


Generating
VoxCPM2 — 2B params, MiniCPM-4 backbone, Apache 2.0.
30 languages auto-detected from input text, no language tag required  ·  48 kHz studio output via AudioVAE V2 (built-in super-resolution from 16 kHz refs)  ·  Context-aware prosody inferred from the text itself  ·  Real-time streaming (RTF ~ 0.13 — toggle below)
Three modes: Voice Design (text-only, describe a voice)  ·  Controllable Cloning (reference + style instruction, timbre preserved)  ·  Ultimate Cloning (audio continuation with transcript, max fidelity).
Voice Design Presets

Languages (30 supported)


Generating


Generating
Chatterbox Turbo
TTS Hub
Ready