Sahara v2: When Voice AI
Finally Understands Africa

Africa speaks. Sahara v2 listens—accurately, reliably, at scale.

Introducing Sahara-v2

For years, voice has been called the most natural interface. Yet for hundreds of millions of people across Africa, digital systems still struggle to understand how they speak—their accents, their names, their numbers, their languages, their silence.

Most speech recognition systems were never designed for Africa. They were trained on Western speech, evaluated on Western benchmarks, and optimized for clean audio and high-resource languages. The result? Models that perform well on global leaderboards—but break down in the real world across African healthcare, finance, government, and customer support.

Today, we’re changing that.

We’re proud to introduce Intron Sahara v2:
a production-ready suite of speech recognition models built specifically for African languages, African accents, and African realities.

The Problem: When Global Models Meet African Speech

Across hospitals, call centers, banks, and public services, we consistently hear the same feedback:

  • “It works on demos, but fails with real users.”
  • “It struggles with African names and entities.”
  • “Numbers, currencies, IDs—everything gets mangled.”
  • “Silence and background noise cause hallucinations.”

These are not edge cases in Africa—they are the default conditions:

  • Heavy accent variation
  • Code-switching across languages
  • Dense named entities (people, places, organizations)
  • Numeric-heavy speech (amounts, IDs, dosages, balances)
  • Noisy, low-resource, conversational audio


Sahara v2 was built for this reality—not adapted to it.

Meet the Sahara v2 Suite

Sahara v2 (ASR)

Africa’s first production-grade bilingual and multilingual speech recognition models, supporting:

  • Accented English & French
  • 20+ African languages
  • 500+ African accents
  • Optimized for short audios, conversations, and limited context

It delivers state-of-the-art performance across:

  • Medical, finance, legal, and call-center speech
  • African personal, organizational, and geographic names
  • Numeric precision: currencies, decimals, IDs, measurements
  • Noise, silence, and overlapping speakers

Sahara TTS

High-quality text-to-speech with:

  • Accented English & French
  • 5+ African languages
  • Voices that sound local, familiar, and natural


Designed for IVR, voice bots, education, and public-facing services—without the “imported accent” problem.

Performance That Actually Transfers to the Real World

Multilingual African Speech (Afrivox Benchmark)

Across 20+ African languages, Sahara v2 consistently delivers the lowest Word Error Rates, often outperforming strong multilingual baselines by 2×–7×.

Languages where Sahara v2 is state-of-the-art include:
Ga, Twi, Igbo, Yoruba, Sesotho, Pedi, Tswana, Zulu, Hausa, Shona, Swahili, and more.

African Accented English (Industry-Specific Benchmarks)

On accented English across medical, parliamentary, and conversational speech, Sahara v2 achieves:

  • WER <15% across all major domains
  • ~18% on medical conversations
  • ~12% on African named entities
  • ~8% on numeric-heavy financial speech


Designed for IVR, voice bots, education, and public-facing services—without the “imported accent” problem.

Industry-Leading Noise & Silence Robustness

Unlike many general-purpose models, Sahara v2 is explicitly trained to handle:

  • Silent segments (no hallucinations)
  • Short utterances
  • Intervening silence
  • Background noise
  • Overlapping speakers
Built With the Community, For the Community

Sahara v2 is powered by millions of audio clips contributed by speakers across Africa—spanning languages, accents, professions, and environments.

We want to go further.

Build With Sahara – Developer Challenge

We’re inviting developers, data scientists, and startups to build the next generation of African voice applications using Sahara.

  • Health, finance, telco, education, agriculture, and more
  • Benchmark Sahara against global alternatives
  • Win prizes, visibility, and partnerships


📢 Africa’s voice ecosystem grows when we build together.

Built for Real Use, at Real Scale

Speech Recognition That Finally Understands Africa

Developers can integrate Sahara-v2 using the new streamlined widget or deploy with full offline support. Proven in real-world deployments with partners including Penda Health, Data.FI, ARM, and State High Courts across Nigeria, Kenya, South Africa, and Eswatini, Sahara-v2 is transforming how organizations serve their customers.

Privacy

Sahara-v2 functions without internet connectivity. By processing all data locally, the system ensures privacy and security for sensitive environments such as healthcare, legal, and finance. This on-device approach protects confidential records, supports regulatory compliance, and enables reliable deployment in remote or low-connectivity locations.

Speech recognition with exceptional accuracy and depth

Sahara-v2 delivers state-of-the-art performance for African speech understanding, supporting African French and 23 new African languages. It outperforms competing models, achieving 25% better overall performance compared with Meta Omni-language ASR and Gemini-3.

On the AfriVox Transcribe Benchmark, Sahara-v2 demonstrates exceptional accuracy where precision is critical. It performs over 64% better on African names, locations, and organizations (AFRINAMES) compared to models such as Gemini-3 and Azure, and over 35% better on numbers, IDs, decimals, and currency. It also proves reliable across real-world use cases, performing over 25% better across key verticals, including health, legal, finance, and call center audio.

Beyond single-language recognition, Sahara-v2 advances multilingual understanding with the world’s first bilingual Swahili-English ASR model. It also demonstrates strong robustness in challenging audio conditions, testing over 20% better on background noise, overlapping speakers, and silence compared to competitors.

Highlights

Overall Performance

Sahara-v2 performs strongly when evaluated against leading speech models, including Gemini-3, Azure, ElevenLabs, GPT-4-audio, and Whisper, consistently outperforming them across accuracy, robustness, and domain reliability.

African Names and Entities

On African names, locations, and organizations, Sahara-v2 performs over 64% better, reflecting its ability to handle culturally specific entities that are frequently misrecognized by global models. This is measured using the AFRINAMES evaluation.

Numerical & Structured Data

Sahara-v2 achieves over 35% better performance on numbers, IDs, decimals, currency, monetary values, and fractions, supporting precision-critical use cases in finance, healthcare, and legal documentation. These results are benchmarked using the NUMBERS evaluation.

Robustness in Challenging Audio

In noisy, real-world conditions, Sahara-v2 performs over 20% better, handling background noise, overlapping speakers, and silence more effectively than competing systems. Benchmarks compare Sahara-v2 with Gemini-3, Azure, Deepgram, GPT-4-audio, and Whisper.

Performance Across Verticals

Across domain-specific audio in health, legal, finance, and call center environments, Sahara-v2 performs over 25% better, ensuring reliable transcription where domain terminology and structured speech are essential.

Language Coverage and Multilingual Performance

Sahara-v2 now supports African French and 23 new African languages, bringing the total to 57. It delivers 25% better overall performance compared with Meta Omni-language ASR and Gemini-3.

Built for Real Use, at Real Scale

Sahara-v2 pushes the boundaries of speech understanding with improved robustness, delivering enhanced acoustic modeling to help you handle even the most challenging audio scenarios.

In testing, Sahara-v2 demonstrates superior performance on challenging audio, testing over 20% better regarding robustness (specifically background noise, overlapping speakers, and silence) compared to competitors like Gemini-3 and Azure. It also proves its reliability in specialized fields, performing over 25% better across verticals such as health, legal, finance, and call center audio. This capability is validated by real-world deployments with partners like Penda Health and the Ogun State High Court.

Sahara-v2 helps you transcribe, understand, build, and connect anything

Transcribe anything

Sahara-v2 is built from the ground up to master the complexities of African speech, supporting over 500 accents while thriving in real-world acoustic environments. It delivers production-ready transcription across medical, legal, and call center sectors, outperforming industry standards by over 25% in these critical verticals. By supporting 57 languages, including 23 new African languages and African French, the model leverages offline and parallel processing to provide high-performance transcription at an enterprise scale.

More than just processing audio, Sahara-v2 understands how people actually speak. It introduces the world’s first bilingual Swahili–English ASR, enabling seamless code-switching and natural conversation capture, a breakthrough validated by Penda Health. Sahara-v2 is built for high-stakes enterprise settings, delivering precise transcription of African names, numbers, and citations. Even with heavy background noise or overlapping speech, it maintains superior robustness. Partners such as the Ogun and Yobe State High Courts, ARM, and Data.FI already rely on this capability.

 Benchmark Excellence

Sahara-v2 has emerged as the leading model in the African linguistic landscape, consistently delivering the lowest Word Error Rates (WER) across every language evaluated. Its strongest benchmarks come in Pidgin (5%), Kinyarwanda (10%), and Swahili (11%). This is most evident in high-stakes comparisons: in Kinyarwanda, it secures a 10% WER against Gemini-3.0-flash’s 40%, while in Tswana, it holds a 22% WER as Gemini falters at 77%.

Across all evaluated languages, Sahara-v2 maintains superior accuracy, outclassing rivals in Twi (11%), Zulu (16%), Hausa (18%), Shona (18%), Yoruba (19%), Luganda (19%), Igbo (20%), and Pedi (23%). Its most significant breakthrough is its exclusive mastery of Pidgin. While Meta-Omni-ASR and Gemini-3.0-flash fail to produce any measurable output for the dialect, Sahara-v2 delivers seamless, high-fidelity transcription. These findings position Sahara-v2 as the sole capable solution for low-resource African language transcription.

0 %+

Higher accuracy on African names, locations, and organizations  

0 %+

Higher accuracy on numbers, IDs, decimals, and currency

0 %+

Better performance across healthcare, legal, finance, and call‑center audio

0 %+

Stronger robustness in noisy, real‑world environments  

Understand anything

Sahara-v2 was built to process African speech across multiple modalities. This includes the world’s first bilingual Swahili-English ASR model, support for 500+ accents through its new Accented English model, and resilience in challenging acoustic environments. By combining state-of-the-art acoustic modeling, multilingual performance across 57 African languages, and full offline capability, Sahara-v2 pushes the frontier of African speech understanding.

Legal professionals, such as those at the Ogun State High Court, can rely on Sahara-v2 to capture African names, locations, and organizations with precision — a task where the model performs over 64% better than competitors. Healthcare providers like Penda Health and Data.FI can also record patient consultations in bilingual Swahili-English, as Sahara-v2 performs over 25% better across verticals, including health.

Sahara-v2 also processes audio from busy call center environments, such as deployments with partner ARM, filtering through background noise and overlapping speakers to deliver accurate transcripts. In these conditions, the model performs over 20% better in robustness than other major speech models.

Build anything

Sahara-v2 makes any voice application possible for African contexts. It handles complex acoustic variations to enable reliable voice interfaces, testing over 20% better in robustness against background noise, overlapping speakers, and silence compared to competitors.

Sahara-v2 is our most capable voice AI model built for real-world use so far. It makes applications more accessible with specialized capabilities for Voice Bots, Voice Autofill for KYC, application, and admission forms, and Voice Banking. The model supports accurate capture of structured data like names, IDs, and numbers, performing over 35% better where precision matters. It also handles safety-critical voice interactions, performing over 25% better across verticals such as health, legal, and finance.

You can now build with Sahara-v2 using the new streamlined widget for integrations and participate in the Build with Sahara Developer challenge. It is also available with offline support to enable deployment in diverse environments.

Accelerating African voice AI development with Sahara-v2

As voice AI accelerates with Sahara-v2, we are enhancing the developer experience for African voice applications. Today, we are introducing a new streamlined widget for integrations and launching the Build with Sahara Developer challenge to help developers build and deploy solutions faster.

Sahara-v2’s advanced speech understanding and offline support give developers the tools to build robust voice applications. They can now tap into capabilities such as Voice Bots (available in 7 languages and accented English), Voice Autofill for KYC, application, and admission forms, and Voice Banking for command-driven fintech interactions. These features are built to handle complex African contexts, backed by Sahara-v2’s performance across 23 new African languages.

Connect anything

Sahara-v2 marks a significant step forward in speech recognition, particularly in how it handles challenging audio environments. It tests over 20% better than competitors like Gemini-3 and Azure when it comes to background noise, overlapping speakers, and silence. It handles natural language complexity through the world’s first bilingual Swahili-English ASR model and voice bot support across seven languages, and enables you to connect any application or workflow through reliable voice interaction.

Building Sahara-v2 for critical domains

The model is built for critical domains, performing over 25% better across verticals, including health, legal, and finance. To support these sectors, we are introducing Voice Banking for command-driven fintech interactions and Voice Autofill for processing sensitive data in KYC, application, and admission forms.

Sahara-v2 has been validated by partners in high-stakes, real-world environments. This includes legal applications with the Ogun State High Court and Yobe State High Court, as well as healthcare implementations with Penda Health (Kenya) and Data.FI (Eswatini & Nigeria).

The Sahara-v2 Era Begins: A New Milestone in African Innovation

The Sahara-v2 era starts today. Here is what is rolling out:

For Enterprises:

  • Higher accuracy where it actually matters
  • Fewer errors, retries, and manual corrections
  • Better customer experience and operational efficiency

For Developers:

  • Models that finally work for African users
  • Clear APIs, strong documentation, real benchmarks

For Investors & Partners:

  • Proof that region-specific AI wins in real markets
  • Defensible data, infrastructure, and deployment advantage

For Africa:

  • Technology that listens, understands, and includes

Join the Build with Sahara Developer Challenge 

We are inviting developers across Africa to build the next generation of voice-enabled applications with Sahara-v2. The new streamlined widget for integrations and our offline support are now available to help you build robust solutions.

What you can build:
Real impact across Africa

Humanizing Automated Support in Financial Services

ARM is leveraging Sahara-v2 to optimize customer service operations within its call centers. The system is specifically validated for high-traffic environments, designed to maintain clarity even when dealing with background noise and overlapping speakers. By utilizing specialized accented English models, ARM ensures that diverse customer voices are understood accurately, providing a more reliable automated support experience than standard global alternatives.

Letting Doctors Be Doctors Again

Penda Health relies on the world’s first bilingual Swahili-English model to capture patient interactions, a critical feature for environments where speakers naturally switch between languages (code-switching). These healthcare deployments are supported by Voice Autofill, which helps streamline admission forms and ensures that domain-specific medical terminology is captured with high precision.

The Nigerian Judiciary: Protecting the Integrity of the Record

The Ogun State High Court and Yobe State High Court in Nigeria have adopted Sahara-v2 to support legal transcription. To meet strict requirements for privacy and reliability, these courts utilize the system’s offline capabilities, allowing sensitive proceedings to be processed locally without depending on internet connectivity. The technology is specifically optimized to recognize African names, locations, and organizations, ensuring that the judicial record remains accurate and preserves local context.

Availability

Sahara-v2 is available now for developers, enterprises, and organizations across Africa.

For developers

Access the new streamlined widget for integrations

For Enterprise Partnerships

Empower your organization with specialized voice technology.

The Future of African Voice AI

Gain deep insights into the evolving technological landscape with our upcoming 2026 Africa Voice AI Report. The report explores the trends, challenges, and opportunities shaping the future of speech interfaces across the continent.