November 19, 2025
For deaf and hard of hearing users, speech-to-text isn't a convenience—it's access. Live captions in meetings, transcripts of phone calls, and real-time subtitles can mean the difference between participation and exclusion.
But accuracy metrics don't capture everything that matters for this community. Here's what product teams should understand when building STT for deaf and hard of hearing users.
Word Error Rate tells you how many words are wrong, but not:
Missing a name, a number, or a key term is very different from missing "um" or "the." For deaf users relying on captions, critical information errors cause real harm. See our WER explainer for what the metric actually measures.
Five errors scattered across a paragraph might be fine. Five errors clustered in one sentence can make it incomprehensible.
Who's speaking? Is there laughter, applause, music, a phone ringing? This context matters for understanding what's happening.
Captions that lag behind speech, appear in overwhelming chunks, or flash by too quickly fail users even when the words are right.
In multi-person conversations, knowing who said what is essential. Good speaker diarization isn't optional—it's core functionality.
Display approaches:
Important non-speech audio needs indication:
Automated sound detection is improving but often needs human review for important content.
Captions need to be readable, not just accurate. Consider:
Design choices affect readability:
Deaf users are generally tolerant of reasonable caption delay (1-3 seconds), but longer delays create disorientation—speakers have moved on while captions catch up.
For live events, the tradeoff between accuracy and latency is real. Communicate what users should expect. See our comparison of real-time vs. batch transcription for the architectural considerations.
Automated captions that would be merely annoying for hearing users can be exclusionary for deaf users. Quality matters more, not less.
Deaf users can't hear when captions fail. Provide other ways to flag issues, and actually act on feedback.
Deaf users have diverse preferences and needs. Hard of hearing users may use captions differently than profoundly deaf users. Customization helps.
Bolting captions onto a product designed without them creates awkward experiences. Design for captions from the start.
For high-stakes situations, human captioning often remains necessary:
Automatic captions are improving rapidly, but knowing their limitations prevents harmful failures.
Research consistently shows deaf users want:
Build with these priorities, not just WER benchmarks.
The W3C Web Content Accessibility Guidelines (WCAG) set requirements for caption quality. Our WCAG compliance guide covers the specific requirements for audio content.
The National Association of the Deaf advocates for quality captioning standards and can provide guidance on community needs.
For broader voice feature accessibility, see our guide on designing voice features that actually help.
Subscribe to our newsletter for tips, exciting benefits, and product updates from the team behind Voice Control!

The ultimate language training app that uses AI technology to help you improve your oral language skills.

Simple, Secure Web Dictation. TalkaType brings the convenience of voice-to-text technology directly to your browser, allowing you to input text on any website using just your voice.

Expand the voice features of Google Gemini with read aloud and keyboard shortcuts for the built-in voice recognition.