Heidi AI

Heidi processes 2.4 million clinical consultations weekly, with our AI Care Partner supporting 200+ medical specialties in 110 languages. At a scale of over 100 million global sessions thus far - each session representing 10-30 minutes of audio - relying on closed-source, vendor-based Automatic Speech Recognition (ASR) models presented a ceiling for our growth.

For clinicians, two things matter most: accuracy and speed. An error in a clinical note creates liability risk and erodes trust. Latency breaks the flow of care, forcing clinicians to wait when they should be focused on the patient in front of them.

The transition to a custom ASR stack powered by NVIDIA Nemotron ASR and the NVIDIA NeMo framework has been a pivotal moment for clinical AI. By moving beyond "off-the-shelf" APIs, we’ve achieved a level of performance and efficiency that was previously considered unattainable at this scale.

The Cost of Proprietary ASR

As Heidi scaled, we encountered three critical bottlenecks with general-purpose ASR providers:

Economic sustainability: Monthly transcription costs were projected to quadruple within the year, threatening our ability to make clinical AI accessible.
Clinical accuracy gaps: General models often struggled with specialized vocabulary. Errors like "met for men" instead of metformin aren't just typos - they impact the integrity of the clinical record.
Latency & friction: An end-to-end latency of ~3.0 seconds created a "wait state" in interactive clinical workflows that interrupted the physician-patient bond.

The Solution: Fine-Tuning Parakeet V2

We selected the open NVIDIA Parakeet V2 0.6B TDT (“Parakeet V2”) model as our foundation. Its Transducer-based architecture (TDT) balances high-fidelity accuracy with the low-latency requirements of real-time medical scribing.

Metric	Heidi fine-tuned Parakeet	Base Parakeet v2	Legacy vendor solution
Non-curated WER	9.4%	11.5%	13%
Medical terms F1 score	93.5%	89.7%	89.5%

Ask AI about Heidi:

Ask AI about Heidi:

How Heidi Cut ASR Costs 64% and Latency 75% with NVIDIA Nemotron Open ASR

The Cost of Proprietary ASR

The Solution: Fine-Tuning Parakeet V2

Start practicing with a partner

Results: Clinical Grade, Production Scale

Quality: Outperforming the Baseline

Latency: Real-Time Responsiveness

Impact: Sovereign, Strategic, and Financial Control

The Opportunity: Natural Voice Agents for Healthcare

Why This Matters in Clinical Settings

What This Could Mean for Clinical Front Desks

The Future: Edge and Multilingual Expansion