Quick Tags
Generation Parameters
Emotion tags work best WITHOUT voice cloning.
Format: (emotion)Text with NO space after the closing parenthesis.
Emotion Tags
Click or drag audio file here
Qwen3-TTS supports instruct mode for fine-grained voice control (emotion, speed, pitch, volume) and voice cloning from a reference audio file.
Instruct Tags
VoxCPM2 — 2B params, MiniCPM-4 backbone, Apache 2.0.
30 languages auto-detected from input text, no language tag required · 48 kHz studio output via AudioVAE V2 (built-in super-resolution from 16 kHz refs) · Context-aware prosody inferred from the text itself · Real-time streaming (RTF ~ 0.13 — toggle below)
Three modes: Voice Design (text-only, describe a voice) · Controllable Cloning (reference + style instruction, timbre preserved) · Ultimate Cloning (audio continuation with transcript, max fidelity).
30 languages auto-detected from input text, no language tag required · 48 kHz studio output via AudioVAE V2 (built-in super-resolution from 16 kHz refs) · Context-aware prosody inferred from the text itself · Real-time streaming (RTF ~ 0.13 — toggle below)
Three modes: Voice Design (text-only, describe a voice) · Controllable Cloning (reference + style instruction, timbre preserved) · Ultimate Cloning (audio continuation with transcript, max fidelity).
Voice Design Presets
Languages (30 supported)
Chatterbox Turbo
TTS Hub
Ready