Voice
Clone any voice. 32 languages. Under 150ms.
Neural voice synthesis that sounds human. Clone voices from 30 seconds of audio, synthesize in any language, control emotion — with consent verification and watermarking built in.
Voice AI that respects consent
Studio-quality synthesis with built-in rights protection. Every cloned voice requires consent. Every generated audio carries a watermark.
Voice cloning
Create a neural voice model from 30 seconds of audio. 99.2% objective accuracy on MOS tests. Works across languages.
32 languages
Text-to-speech and voice cloning in 32 languages. Cross-lingual cloning lets you speak languages you don't know — in your own voice.
Emotion control
Eight emotion styles: neutral, happy, sad, excited, calm, angry, whisper, and broadcast. Adjust pace, pitch, and emphasis per sentence.
Sub-150ms synthesis
Real-time audio generation fast enough for live narration, accessibility descriptions, and interactive voice responses.
Studio-quality output
48kHz, 24-bit audio. Output in PCM, MP3, or Opus. Indistinguishable from human speech in blind listening tests.
Voice rights protection
Consent verification required before cloning. Inaudible watermarking embeds provenance. Compliant with emerging AI voice regulations.
Three steps to a custom voice
Upload 30 seconds of audio
Clear speech in a quiet environment. Any language. The model extracts pitch, timbre, cadence, and accent characteristics.
Train and verify
Neural model trains in under 5 minutes. Upload voice consent form. WAVE verifies consent before activating the voice profile.
Generate speech
Send text via API or dashboard. Choose emotion, language, and output format. Audio streams back in under 150ms. Watermark embedded automatically.
Built for these workflows
Synthesis via API
One endpoint for text-to-speech, voice cloning, and emotion control.
// Generate speech with cloned voice
const audio = await wave.voice.synthesize({
text: 'Welcome to the broadcast.',
voiceId: 'voice_clone_abc123',
language: 'en',
emotion: 'broadcast',
format: 'mp3',
});
// Returns: audio stream (sub-150ms first byte)Technical specifications
Frequently asked questions
How does voice cloning work?
Upload 30 seconds of clear speech. WAVE trains a neural voice model in under 5 minutes. The model captures pitch, timbre, cadence, and accent. Clone accuracy reaches 99.2% on objective MOS (Mean Opinion Score) tests.
What languages are supported?
32 languages for text-to-speech and voice cloning. Cross-lingual cloning lets a cloned voice speak in any supported language — your voice, any language, no accent transfer artifacts.
Is the latency low enough for live use?
Yes. Sub-150ms for full synthesis pipeline. Suitable for live narration, real-time accessibility audio descriptions, interactive voice responses, and live dubbing. Streaming mode sends audio chunks as they generate.
Can I use cloned voices commercially?
Yes, with consent verification. WAVE requires proof of voice consent before enabling a cloned voice profile. Inaudible watermarking tracks provenance. Compliant with California AB 602, EU AI Act voice provisions, and FTC guidance on AI-generated voice.
How does emotion control work?
Tag any sentence with one of 8 emotion styles. The model adjusts pitch contour, speaking rate, emphasis, and vocal quality to match. Fine-tune intensity from subtle to dramatic. SSML-compatible for programmatic control.
What about voice consent and ethics?
Every cloned voice requires a signed consent form uploaded during creation. Voices without consent are blocked. Inaudible watermarks embed creator ID and timestamp in all generated audio for forensic verification.
Give your content a voice
10,000 free characters every month. No credit card required.