July 16, 2025
If your website or app includes audio or video content, accessibility law probably requires you to provide captions or transcripts. The good news: modern speech-to-text makes this easier than ever. The bad news: auto-generated captions often don't meet compliance standards without human review.
This guide covers what WCAG actually requires and how to meet those requirements efficiently.
The Web Content Accessibility Guidelines (WCAG) set the standard for digital accessibility. For audio and video content:
Most organizations target Level AA compliance. If you're subject to ADA, Section 508, or similar regulations, AA is typically the minimum expectation.
Captions are synchronized with the video—text appears at the same time as the corresponding audio. They include speaker identification and relevant sound effects.
Transcripts are standalone text documents containing everything that was said. They're not synchronized but provide a complete text alternative.
Both serve accessibility needs, but they're not interchangeable:
YouTube's auto-captions, Zoom's live transcription, and similar tools are a good starting point—but they typically don't meet WCAG standards:
Accuracy issues:
Formatting issues:
The standard: WCAG doesn't specify an accuracy percentage, but the general expectation is that captions should be accurate enough to convey the same information as the audio. Most experts suggest targeting 99%+ accuracy for compliance.
For guidance on choosing transcription providers, see our STT API comparison.
Live captioning is harder. Options include:
For high-stakes live content (legal proceedings, official announcements), human captioners are still the standard. See our comparison of real-time vs. batch transcription for the technical tradeoffs.
Transcripts are required for audio-only content and recommended as a supplement to video captions.
A good transcript includes:
Transcripts can be generated from captions or created separately. Many organizations publish them as downloadable documents or expandable text on the page.
Beyond captions, voice-enabled features need careful accessibility design. Platform implementations like Apple Voice Control and Microsoft Voice Access provide good reference points.
For a broader perspective on building accessible voice features, see our guides on voice control vs. dictation patterns and designing voice features that actually help.
Subscribe to our newsletter for tips, exciting benefits, and product updates from the team behind Voice Control!

The ultimate language training app that uses AI technology to help you improve your oral language skills.

Simple, Secure Web Dictation. TalkaType brings the convenience of voice-to-text technology directly to your browser, allowing you to input text on any website using just your voice.

Expand the voice features of Google Gemini with read aloud and keyboard shortcuts for the built-in voice recognition.