Upgrade your language learning experience with Talkio AI

Get 15% off! Click here to redeem offer!

Get Talkio
Voice Control for ChatGPT

Voice Control for ChatGPT

July 16, 2025

Captions, Transcripts, and WCAG: A Practical Compliance Guide

If your website or app includes audio or video content, accessibility law probably requires you to provide captions or transcripts. The good news: modern speech-to-text makes this easier than ever. The bad news: auto-generated captions often don't meet compliance standards without human review.

This guide covers what WCAG actually requires and how to meet those requirements efficiently.

What WCAG requires

The Web Content Accessibility Guidelines (WCAG) set the standard for digital accessibility. For audio and video content:

Level A (minimum)

  • Captions for pre-recorded video — Synchronized text for all spoken content
  • Audio descriptions OR transcripts for pre-recorded video with important visual information
  • Transcripts for pre-recorded audio-only content (podcasts, etc.)

Level AA (standard target for most organizations)

  • Captions for live video — Real-time captioning for streams and broadcasts
  • Audio descriptions for pre-recorded video — Narration of important visual content

Level AAA (highest)

  • Extended audio descriptions — Pausing video to provide more detailed descriptions
  • Sign language interpretation — For pre-recorded video

Most organizations target Level AA compliance. If you're subject to ADA, Section 508, or similar regulations, AA is typically the minimum expectation.

Captions vs. transcripts: what's the difference?

Captions are synchronized with the video—text appears at the same time as the corresponding audio. They include speaker identification and relevant sound effects.

Transcripts are standalone text documents containing everything that was said. They're not synchronized but provide a complete text alternative.

Both serve accessibility needs, but they're not interchangeable:

  • Deaf users generally prefer captions for video content—see our guide on STT for deaf and hard of hearing users
  • Deafblind users may rely on transcripts with screen readers
  • Search engines can index transcripts, improving discoverability

Why auto-captions aren't enough

YouTube's auto-captions, Zoom's live transcription, and similar tools are a good starting point—but they typically don't meet WCAG standards:

Accuracy issues:

  • Names and technical terms are frequently wrong
  • Homophones get confused ("their" vs. "there")
  • Background noise causes errors
  • Accents and speech patterns affect quality

Formatting issues:

  • No speaker identification
  • Missing punctuation or incorrect sentence breaks
  • Sound effects and music not indicated
  • Poor timing synchronization

The standard: WCAG doesn't specify an accuracy percentage, but the general expectation is that captions should be accurate enough to convey the same information as the audio. Most experts suggest targeting 99%+ accuracy for compliance.

For guidance on choosing transcription providers, see our STT API comparison.

A practical captioning workflow

For pre-recorded content

  1. Generate auto-captions using your video platform or a transcription service
  2. Review and correct — Focus on names, technical terms, and any obvious errors
  3. Add speaker labels — "[John]:" or similar indicators
  4. Add non-speech sounds — [music], [applause], [phone ringing]
  5. Check timing — Captions should appear close to when words are spoken
  6. Export in standard format — SRT or VTT files work across most platforms

For live content

Live captioning is harder. Options include:

  • Professional CART providers — Human captioners typing in real-time (most accurate, most expensive)
  • AI live captioning — Faster and cheaper, but less accurate
  • Hybrid approaches — AI-assisted human captioning

For high-stakes live content (legal proceedings, official announcements), human captioners are still the standard. See our comparison of real-time vs. batch transcription for the technical tradeoffs.

Transcripts: when and how

Transcripts are required for audio-only content and recommended as a supplement to video captions.

A good transcript includes:

  • Speaker identification throughout—speaker diarization can help automate this
  • Descriptions of relevant sounds and context
  • Logical formatting with paragraphs and section breaks
  • Timestamps (optional but helpful for long content)

Transcripts can be generated from captions or created separately. Many organizations publish them as downloadable documents or expandable text on the page.

Common compliance mistakes

  • Relying solely on auto-captions without review
  • Missing speaker identification in multi-person content
  • Ignoring non-speech audio that conveys meaning
  • Poor timing that makes captions hard to follow
  • No transcripts for audio-only content like podcasts

Making your voice features accessible

Beyond captions, voice-enabled features need careful accessibility design. Platform implementations like Apple Voice Control and Microsoft Voice Access provide good reference points.

For a broader perspective on building accessible voice features, see our guides on voice control vs. dictation patterns and designing voice features that actually help.

Subscribe to our newsletter

Subscribe to our newsletter for tips, exciting benefits, and product updates from the team behind Voice Control!

Other projects from the team

Talkio AI

Talkio AI

The ultimate language training app that uses AI technology to help you improve your oral language skills.

TalkaType

TalkaType

Simple, Secure Web Dictation. TalkaType brings the convenience of voice-to-text technology directly to your browser, allowing you to input text on any website using just your voice.

Voice Control for Gemini

Voice Control for Gemini

Expand the voice features of Google Gemini with read aloud and keyboard shortcuts for the built-in voice recognition.

BlogSupportInstall voicesDownload and installAbout

Latest blog posts

Claude Opus 4.6 Just Dropped: Everything You Need to Know

Partners

©2025 Aidia ApS. All rights reserved.