AI Audio - Runable Docs

AI Audio is a full audio production toolkit. You can convert text to natural speech, generate music from a prompt, create sound effects, clone voices from samples, swap voices in existing recordings, dub content into other languages, and transcribe audio or video files to text. All audio operations are handled by the agent through the chat. Describe what you need and the agent selects the right tool.

Chat interface showing a generated audio file with a waveform visualization and play/pause controls

What you can do

Text-to-Speech

Convert text into natural-sounding speech with 6 preset voices.

Multi-Speaker Dialogue

Generate conversations between multiple speakers from a script.

Music Generation

Compose original music from a text prompt describing genre and mood.

Sound Effects

Generate sound effects from a text description with loop support.

Voice Cloning

Clone any voice from audio samples and use it across all audio tools.

Voice Swap

Replace the voice in a recording while keeping emotion and timing.

Dubbing

Dub audio or video content into another language automatically.

Transcription

Convert speech to text with speaker labels and audio event tags.

Preset voices

For text-to-speech and dialogue, 6 preset voices are available:

Voice	Description
Rachel (default)	Neutral, clear, conversational
George	Male, warm tone
Sarah	Female, professional
Charlie	Male, casual
Lily	Female, friendly
Chris	Male, energetic

You can also use a cloned voice for any operation. See Voice Cloning.

Output format

All audio files are generated as MP3 (128kbps, 44.1kHz) or WAV. Files appear in the chat with a waveform player for instant playback. Click Download to save to your device.

What AI Audio does not support

Real-time audio streaming or live voice interaction.
Merging or mixing two audio tracks together (for example, voice over background music).
Editing audio waveforms directly (trimming, cutting, fading). Use an external audio editor for post-production.
Generating audio longer than 10 minutes in a single operation for music.

Next steps

Text-to-Speech

Convert your first text into natural speech.

Music Generation

Compose original music from a text description.

Motion Control Text-to-Speech

⌘I

​What you can do

Text-to-Speech

Multi-Speaker Dialogue

Music Generation

Sound Effects

Voice Cloning

Voice Swap

Dubbing

Transcription

​Preset voices

​Output format

​What AI Audio does not support

​Next steps

Text-to-Speech

Music Generation

What you can do

Preset voices

Output format

What AI Audio does not support

Next steps