Skip to main content
AI Audio is a full audio production toolkit. You can convert text to natural speech, generate music from a prompt, create sound effects, clone voices from samples, swap voices in existing recordings, dub content into other languages, and transcribe audio or video files to text. All audio operations are handled by the agent through the chat. Describe what you need and the agent selects the right tool.
Chat interface showing a generated audio file with a waveform visualization and play/pause controls

What you can do

Preset voices

For text-to-speech and dialogue, 6 preset voices are available:
VoiceDescription
Rachel (default)Neutral, clear, conversational
GeorgeMale, warm tone
SarahFemale, professional
CharlieMale, casual
LilyFemale, friendly
ChrisMale, energetic
You can also use a cloned voice for any operation. See Voice Cloning.

Output format

All audio files are generated as MP3 (128kbps, 44.1kHz) or WAV. Files appear in the chat with a waveform player for instant playback. Click Download to save to your device.

What AI Audio does not support

  • Real-time audio streaming or live voice interaction.
  • Merging or mixing two audio tracks together (for example, voice over background music).
  • Editing audio waveforms directly (trimming, cutting, fading). Use an external audio editor for post-production.
  • Generating audio longer than 10 minutes in a single operation for music.

Next steps