Skip to main content
Multi-speaker dialogue generates audio with two or more distinct voices, each speaking their assigned lines. You write the script in a simple format and the agent produces a single audio file with all speakers.

How to generate dialogue

1

Write or provide your script

Format your script with speaker names followed by their lines. The agent auto-detects this format.

A two-person podcast intro.

2

Assign voices (optional)

If you do not assign voices, the agent automatically assigns different voices to each speaker using round-robin from the voice pool. To assign specific voices, mention them in your prompt:

Dialogue with specific voice assignments.

3

Review and download

The agent generates a single audio file with all speakers. Each speaker has a distinct voice. The file appears in the chat with a waveform player.
Chat showing a generated dialogue audio file with waveform visualization and play controls
Multi-speaker dialogue requires at least 2 unique speakers in the script. If only one speaker is detected, the agent falls back to single-voice text-to-speech.

Script format

The agent detects multi-speaker scripts automatically when lines follow this pattern:
SpeakerName: Their dialogue line here.
AnotherSpeaker: Their response here.
Speaker names can be anything. The agent assigns a unique voice to each name.

What you cannot do

  • You cannot assign different emotions or tones per line. The voice model interprets tone naturally from the text.
  • You cannot control pauses between speakers.
  • You cannot generate dialogue with more than the available voice pool. If you have many speakers, some may share voices.

Next steps