Upgrade your language learning experience with Talkio AI

Get 15% off! Click here to redeem offer!

Get Talkio
Voice Control for ChatGPT

Voice Control for ChatGPT

February 4, 2026

Word Error Rate (WER) Explained: The Metric Behind Speech Recognition Accuracy

When someone claims their speech recognition system has "95% accuracy," what does that actually mean? Usually, they're talking about Word Error Rate—the standard metric for measuring how well a speech-to-text system performs.

Understanding WER helps you cut through marketing claims and evaluate whether a transcription service will actually work for your use case. Let's break it down.

What WER actually measures

Word Error Rate compares a transcript to a "ground truth" reference and counts three types of errors:

  • Substitutions: The system heard the wrong word ("cat" instead of "cap")
  • Deletions: The system missed a word entirely
  • Insertions: The system added a word that wasn't spoken

The formula is straightforward:

WER = (Substitutions + Deletions + Insertions) / Total Words in Reference

So if you spoke 100 words and the system made 5 substitutions, 2 deletions, and 3 insertions, your WER would be 10%.

Why WER doesn't tell the whole story

Here's where it gets tricky. A 5% WER sounds great, but it doesn't tell you:

  • Which words were wrong: Missing "not" in "do not deploy to production" is catastrophic. Missing "the" is barely noticeable.
  • Punctuation and formatting: Most WER calculations ignore punctuation entirely, but a transcript without commas or periods is painful to read.
  • Casing: "apple" vs "Apple" might not count as an error in WER, but it matters when you're transcribing proper nouns.
  • Domain vocabulary: A system might score great WER on general speech but completely fail on medical terms, legal jargon, or your company's product names.

What "good" WER looks like

Benchmarks vary wildly depending on the audio quality and content. According to recent research comparing speech-to-text services:

  • Studio-quality recordings, common vocabulary: 2-5% WER is achievable with modern systems
  • Phone calls or meetings: 10-15% WER is more realistic
  • Noisy environments or heavy accents: 20%+ WER is common, even with good systems

The key insight: always test with audio that matches your actual use case. Vendor benchmarks using clean recordings from datasets like LibriSpeech or TED-LIUM don't predict performance on your team's chaotic Zoom calls.

How to evaluate STT for your needs

Instead of chasing WER numbers, focus on what matters for your specific application—something we cover in detail in our guide to comparing speech-to-text APIs:

  • Collect 20-50 real audio samples from your target environment
  • Track failure modes: Are errors random or clustered around specific words, speakers, or conditions?
  • Consider the downstream impact: A dictation app can tolerate different errors than a voice command system
  • Test edge cases: Accents, background noise, overlapping speakers, technical vocabulary

For noisy environments specifically, the CHiME benchmarks provide valuable insights into how systems perform under challenging acoustic conditions.

Beyond WER: other metrics worth knowing

Depending on your use case, these might matter more:

  • Character Error Rate (CER): Useful for languages without clear word boundaries
  • Sentence Error Rate (SER): What percentage of sentences have any error at all?
  • Real-Time Factor (RTF): How fast is transcription relative to audio length?

If you're building accessible voice features, accuracy metrics need to be weighted differently—missing critical words matters more than overall WER.

Subscribe to our newsletter

Subscribe to our newsletter for tips, exciting benefits, and product updates from the team behind Voice Control!

Other projects from the team

Talkio AI

Talkio AI

The ultimate language training app that uses AI technology to help you improve your oral language skills.

TalkaType

TalkaType

Simple, Secure Web Dictation. TalkaType brings the convenience of voice-to-text technology directly to your browser, allowing you to input text on any website using just your voice.

Voice Control for Gemini

Voice Control for Gemini

Expand the voice features of Google Gemini with read aloud and keyboard shortcuts for the built-in voice recognition.

BlogSupportInstall voicesDownload and installAbout

Latest blog posts

Claude Opus 4.6 Just Dropped: Everything You Need to Know

Partners

©2025 Aidia ApS. All rights reserved.