December 10, 2025
You've picked a great TTS voice, but something still sounds off. The pronunciation of your product name is wrong. The pause between sentences feels rushed. A question sounds like a statement.
SSML (Speech Synthesis Markup Language) is how you fix these problems. It's a standard markup language that lets you control exactly how text-to-speech engines speak your content—pronunciation, pacing, emphasis, and more.The good news: you don't need to learn the whole spec. A handful of tags solve 90% of real-world TTS issues.
<break> — Add pausesThe most useful tag by far. Insert pauses where you want the voice to breathe:
<speak>
Welcome to the demo. <break time="500ms"/> Let's get started.
</speak>Use time to specify duration (milliseconds or seconds) or strength for semantic pauses (weak, medium, strong).
When to use it:
<emphasis> — Stress wordsTell the engine which words deserve emphasis:
<speak>
This is <emphasis level="strong">really</emphasis> important.
</speak>Levels: reduced, moderate, strong
When to use it:
<say-as> — Pronounce things correctlyHandles formats that TTS engines otherwise mangle:
<speak>
Call us at <say-as interpret-as="telephone">8005551234</say-as>.
The meeting is on <say-as interpret-as="date">2025-03-15</say-as>.
</speak>Useful interpret-as values: telephone, date, time, currency, characters (spell out), ordinal, cardinal
When to use it:
<phoneme> — Override pronunciationWhen the engine gets a word completely wrong, spell out how to say it:
<speak>
Welcome to <phoneme alphabet="ipa" ph="ˈtɑːkioʊ">Talkio</phoneme>.
</speak>When to use it:
<prosody> — Control pitch, rate, and volume<speak>
<prosody rate="slow" pitch="+10%">This part is spoken slowly and slightly higher.</prosody>
</speak>Attributes: rate, pitch, volume (with values like slow, fast, percentages, or semitones)
When to use it:
For a deeper understanding of why prosody matters so much, see our piece on stress, rhythm, and AI voice quality.
Don't mark up everything. Start with plain text and add SSML only where something sounds wrong:
Over-engineering with SSML makes content hard to maintain. Use the lightest touch that solves the problem.
<speak> wrapper: Most engines require your SSML to be wrapped in <speak> tagsFor choosing which TTS provider to use in the first place, see our evaluation guide for product teams.
Subscribe to our newsletter for tips, exciting benefits, and product updates from the team behind Voice Control!

The ultimate language training app that uses AI technology to help you improve your oral language skills.

Simple, Secure Web Dictation. TalkaType brings the convenience of voice-to-text technology directly to your browser, allowing you to input text on any website using just your voice.

Expand the voice features of Google Gemini with read aloud and keyboard shortcuts for the built-in voice recognition.