This page showcases comprehensive benchmark tests comparing aiOla’s Jargonic V2 against generalist ASR providers, including Whisper large V3, Assembly AI Best, and Deepgram Nova-3 and ElevenLabs Scribe.
Industry-Specific Terminology – Poor handling of complex, company-specific jargon in manufacturing, logistics, and operations, leading to errors and inefficiencies.
Diverse Acoustic Conditions – High error rates in “real world environments” with noisy, high decibel and multi-speaker effects.
Real-World Conversational Speech – Difficulty in recognizing spontaneous, overlapping, multi languages, and accented speech.
Jargonic V2 solves these challenges, delivering best-in-class transcription accuracy, real-time spoken data structuring, and seamless AI integration.
Performance Benchmarks: How aiOla’s Jargonic V2 Outperforms the Competition
We benchmarked Jargonic V2 against known ASR models using industry-standard transcribed speech datasets, including:
LibriSpeech (EN Clean & Noisy)
TEDLIUM V3 (TED Talks)
AMI Meeting Corpus (Conversational Meetings)
CommonVoice V13 (Multilingual Speech)
AISHELL & MAGIC (Mandarin Chinese)
Earnings Call Dataset (Financial & Business Speech)
English Speech Recognition Across Multiple Datasets
Lower WER = Higher Accuracy
Key Findings:
Jargonic V2 achieves the lowest WER across most English datasets.
LibriSpeech Clean: Jargonic V2 (1.8%) vs. Whisper large V3 (2.0%)
AMI Meetings: Jargonic V2 (15.1%), outperforming Whisper, ElevenLabs, and Deepgram.
CommonVoice V13 (Multilingual Speech): Jargonic V2 delivers 5.2% WER, better than Whisper V3 and Assembly AI Best.
Chinese Speech Recognition Across Key Datasets
Lower CER = Higher Accuracy in Mandarin
AISHELL (Quiet, Controlled Mandarin Speech) – Jargonic V2 leads with 4.7% CER vs. Whisper large V3 (8.9%).
When evaluating keyword spotting, two key metrics are considered. The first is Word Error Rate (WER; lower is better, which assesses whether the system correctly outputs the transcript. The second is Keyword Recall (higher is better), defined as the number of correctly identified keywords divided by the total number of keywords. This metric measures how well the system transcribes keywords.
Keywords on Earnings-22 (English) with jargon and special terms – Earning-22 is a dataset of English-language which includes earnings calls gathered from global companies, scattered over 27 countries worldwide. We create a keyword version to this dataset.
CommonVoice keyword version
The CommonVoice V13 test set is part of Mozilla’s open-source speech dataset, designed to support ASR research across multiple languages. Was crowd-sourced, and as such it includes diverse, user-contributed recordings from speakers of various ages, accents, and regional backgrounds, ensuring broad linguistic coverage. We create a keyword version of the testset
Key Findings:
Jargonic V2 delivers superior keyword spotting (business specific terminology) across technical domains and languages.
Earnings Call Dataset: Jargonic V2 outperforms Whisper and Assembly AI Best in recognizing finance-specific terminology.
Multilingual Keyword Spotting: Jargonic V2 consistently delivers the lowest error rates across German, English, Spanish, French, and Portuguese.
Why aiOla’s Jargonic V2 Wins
Best-in-Class Accuracy – Lowest WER and CER across tested benchmarks. Higher Jargon word detections., i.e., higher recall.
Enterprise-Grade Jargon Recognition – Custom keyword spotting with no manual tuning required.
Optimized for Noisy, Multi-Speaker Environments – Excels in real-world speech conditions.
Data Privacy & Compliance – Built-in Named Entity Recognition (NER) for sensitive data protection.
Seamless API Integration – Deploy instantly into enterprise workflows and AI models.
If you’re a developer and would like to see Jargonic in action: