aiOla AI Research: Advancing Speech & Voice AI

Meet the Minds Behind Our IP

aiola’s research team is a world-class powerhouse in voice and speech AI, with seven PhDs from top companies and academic institutions.

Led by Gil Hetz PhD, Professor Yossi Keshet and Professor Bhiksha Raj, our experts are redefining industry standards, pioneering breakthroughs in ASR and Conversational AI. Their cutting-edge work drives aiola’s unmatched accuracy and adaptability, empowering enterprises to unlock the full potential of spoken data.

Gil Hetz

VP AI. PhD

Aviv Navon

Head of Research, PhD

Aviv Shamsian

Research Tech Lead, PhD Candidate

Neta Glazer

Senior Data Scientist. PhD Candidate

Yael Segal-Feldman

Senior Data Scientist. PhD

Prof. Yossi Keshet

Chief Scientist

Prof. Ethan Fetaya

Research Lead

AI speech model aiola Drax outpaces OpenAI & Alibaba / November, 2025

We developed Drax, a novel discrete flow matching framework for ASR that achieves state-of-the-art recognition accuracy while enabling highly efficient parallel decoding. Our approach uses an audio-conditioned path to better align training and inference, proving that discrete flow matching is a critical advancement for Non-Autoregressive (NAR) ASR.

Research Paper

HuggingFace

Beyond Transcription: Mechanistic Interpretability in ASR / August, 2025

Interpretability methods are gaining popularity for understanding large language models, but they are underexplored in automatic speech recognition (ASR). This work applies techniques like logit lens and linear probing to ASR systems, revealing how acoustic and semantic information evolves across layers.

Research Paper

HuggingFace

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching / June 2025

UmbraTTS is a new Text-to-Speech model that generates both speech and background audio together, creating more realistic, context-aware soundscapes. It uses a self-supervised approach to learn from unannotated recordings, overcoming the lack of paired training data. The result is high-quality, natural-sounding audio with fine control over environmental sound.

Research Paper

Demo

FlowTSE: Target Speaker Extraction with Flow Matching / May 2025

Target speaker extraction is extracting a specific speaker’s voice from a mixture of overlapping speech and background audio. In this work, we explore a simple yet effective approach to TSE using flow matching.

Research Paper

Demo

Jargonic Sets New Standards for Japanese ASR / May 2025

After setting new benchmarks in English, Spanish, French, and more, Jargonic V2 now leads in Japanese as well—delivering not just superior transcription accuracy, but also unmatched recall of specialized terms across industries like manufacturing, logistics, healthcare, and finance.

Learn More

Industry-ready ASR for Enterprises: Jargonic / January 2025

An enterprise-grade speech recognition model that outperforms all competitors across both academic benchmarks and real-world business environments. In comprehensive testing, Jargonic achieved the highest accuracy on standard datasets and superior jargon recognition capabilities, establishing it as the industry’s most accurate speech-to-text solution available.

Learn More

Explore Benchmarks

Whisper in Medusa’s Ear: Multi-head Efficient Decoding for Transformer-based ASR / September 2024

A novel multi-head efficient decoding approach for transformer-based Automatic Speech Recognition (ASR), improving inference speed and accuracy.

Research Paper

GitHub

WhisperNER-tag-and-mask: Enterprise-level Speech Privacy / September 2024

A privacy-focused speech recognition approach that enables entity recognition while anonymizing sensitive information, meeting enterprise-grade security and compliance requirements.

Research Paper

HuggingFace

WhisperNER: Unified Open Named Entity and Speech Recognition / September 2024

An advanced framework that integrates named entity recognition (NER) into speech-to-text pipelines, enhancing real-time voice data processing.

Research Paper

GitHub

Keyword-Guided Adaptation of Automatic Speech Recognition / June 2024

An advanced adaptation model that enhances ASR performance in specialized domains by guiding recognition with contextual keyword injection.

Research Paper

Combining Language Models for Specialized Domains: A Colorful Approach / October 2023

A novel method for combining multiple language models to improve speech recognition across specialized industries, ensuring more accurate jargon recognition.

Research Paper

Open-vocabulary Keyword-spotting with Adaptive Instance Normalization / September 2023

A cutting-edge technique enabling open-vocabulary keyword spotting using adaptive instance normalization to enhance real-time voice interaction and command execution.

Research Paper

aiola Research: Shaping the Future of Speech Intelligence

Meet the Minds Behind Our IP

AI speech model aiola Drax outpaces OpenAI & Alibaba / November, 2025

Beyond Transcription: Mechanistic Interpretability in ASR / August, 2025

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching / June 2025

FlowTSE: Target Speaker Extraction with Flow Matching / May 2025

Jargonic Sets New Standards for Japanese ASR / May 2025

Industry-ready ASR for Enterprises: Jargonic / January 2025

Whisper in Medusa’s Ear: Multi-head Efficient Decoding for Transformer-based ASR / September 2024

WhisperNER-tag-and-mask: Enterprise-level Speech Privacy / September 2024

WhisperNER: Unified Open Named Entity and Speech Recognition / September 2024

Keyword-Guided Adaptation of Automatic Speech Recognition / June 2024

Combining Language Models for Specialized Domains: A Colorful Approach / October 2023

Open-vocabulary Keyword-spotting with Adaptive Instance Normalization / September 2023

Share your details to schedule a call

You're on the Jargonic API waitlist!

Thanks!

Application Received!

Cookie Policy