Best-in-class speech recognition and text-to-speech models for African accents
Beats OpenAI, Google, AWS,
Azure across multiple benchmarks
Today, we’re launching Sahara — a breakthrough family of speech recognition models trained on thousands of hours of proprietary audio from 18,000+ speakers, across 300+ non-native English accents from 30+ African countries. Powered by our proprietary AccentMix™ algorithm, Sahara doesn’t just keep up — it outperforms OpenAI’s Whisper, GPT-4o Transcribe, Nvidia Canary, Google Speech-to-Text, AWS Transcribe, and Azure Speech across the board.
We’re talking scripted or conversational, and real-world, in-the-wild speech — across high-impact domains like healthcare, finance, legal, African named entities, and voice commands. No accent left behind.
Here’s the kicker: we’re a tiny, seed-stage startup. We don’t have the luxury of bottomless compute or internet-scale data. So we had to do things differently — leaner, smarter, and relentlessly focused on real-world performance. That’s how AccentMix was born: a patented algorithm purpose-built to handle the rich diversity of English accents across the African continent.
We’re incredibly proud of what we’ve built.
See the benchmarks. Hear the difference. Welcome to Sahara.
Medical
The app can make reporting 100% faster and performs better than Google transcribe
Radiologist
University College Hospital, Ibadan
Medical
I really don’t like typing much honestly but with the speech to text app it has changed everything for me and made it easier with me just having to edit just a little, instead of typing a full page.
Radiologist
Aminu Kano Teaching Hospital
Medical
It has decreased the time spent during documentation which gives me the chance to focus more on the hands-on management of patients.
Radiology
Barau Dikko Teaching Hospital
Medical
Accuracy was in the low 90% range. Had to make four or so corrections. Dropped off if I spoke faster, but if I kept a decent pace it was pretty accurate.
Physician- Chris Hani Baragwanath Academic Hospital
Medical
I used the clinical dictation mode which captures most of the medical terms accurately. The interface is really user friendly. And easy to navigate.
Physician
Zuri Health
Word Error Rate (WER) is a common way to measure how accurate speech recognition systems are. It compares what the system heard to what was actually said. It measures the model’s ERROR, so lower is better. It divides the number of word-level errors by the total number of words
Word Error Rate (WER) is a common way to measure how accurate speech recognition systems are. It compares what the system heard to what was actually said. It measures the model’s ERROR, so lower is better. It divides the number of word-level errors by the total number of words
Why it matters: WER tells us how reliable a speech-to-text system is. A lower WER means fewer mistakes and better performance—critical for areas like healthcare, legal, and customer service.
Strengths: Simple to calculate, Easy to compare different systems, Works across languages
Weaknesses: Treats all errors equally—even if some are more harmful (e.g., “don’t take” vs “take”); Even single character errors like carrot vs carot get full penalty, so it can be overly harsh and punitive; Doesn’t consider punctuation or context; May not reflect user satisfaction or usefulness
WER = (Substitutions +
Insertions + Deletions) ÷ Total Words
Substitutions: wrong words
Insertions: extra words
Deletions: missing words
Spoken: “Take your medicine daily”
Transcript: “Take your message daily”
WER = 1 error ÷ 4 words = 25%
Punching way above models 2-3x its size, Sahara demonstrates superior performance on Accented English speech in a pan-African context across multiple industries (health, finance, legal, academia, etc) and domains with impressive robustness to background noise, intonations, and domain-specific vocabulary.
Sahara demonstrates superior performance in accented voice recognition in healthcare, leading several open and closed models in recognition of complex medical terminologies across specialties, with various diagnosis, measurements, imaging and lab results, and medications in over 300 African accents under diverse ambient clinical settings
a 200+hr public benchmark dataset of scripted (read) clinical speech in 120 African accents from 2,463 speakers in 13 countries
a public pan-African conversational speech dataset of 49 spontaneous medical and non-medical conversations with 14 African accents across 3 countries
an unreleased multi-institution multi-specialty dataset of real world medical speech in real-world clinical settings across 6 countries, 200+ speakers and >50 accents
an unreleased multi-country dataset of real world doctors testing out voice transcription in various clinical settings with significant ambient hospital noise
a recently released medical multispecialty dataset of 25 simulated long-from doctor-patient conversations from male and female doctor- and patient actors across Nigeria
an unreleased dataset of 30+ minute-long multispeaker clinical research interviews from East Africa
an unreleased dataset of real-world telephone call center conversations between various agents and customers sampled at 8kHz.
a 3hr subset of the Afri-Names dataset rich with voice commands for multiple scenarios, e.g. “Hey Bixby, transfer 3,500 KES from my MPESA account to Account number A123Z789 at Standard Bank”, OR “Alexa, play ‘Love me Jeje’ by Tems”
a 4hr subset of the Afri-Names dataset rich in numbers, fractions, measurements, decimals, currency, etc e.g. “Lapo Microfinance Bank had a turnover of N3.687 billion in 2023, a 6.81% year over year increase”
our most challenging dataset ever, a 9 hour novel open pan-African accented read speech dataset rich with African named entities, proper nouns, numbers, fractions, currency, simulated IDs, and voice-assistant commands for evaluation ASR models on various tasks and domains like finance, healthcare, and speech commands, with 500+ unique speakers from 10+ countries.
a 2hr subset of the Afri-Names dataset enriched with African named entities, person, location, organization names and dates, e.g. “Halima, Hamzat, Shola, and Chinedu were childhood friends who grew up in Yaba in Lagos from June 1998 to December 2006”.
a 3hr subset of the Afri-Names dataset rich with voice commands for multiple scenarios, e.g. “Alexa, play ‘Love me Jeje’ by Tems”, OR “Hey Bixby, transfer 3,500 KES from my MPESA account to Account number A123Z789 at Standard Bank”.
a 4hr subset of the Afri-Names dataset rich in numbers, fractions, measurements, decimals, currency, etc e.g. “Lapo Microfinance Bank had a turnover of N3.687 billion in 2023, a 6.81% year over year increase”
a 35+ hour open pan-African transcribed dataset of legislative/parliamentary proceedings with ambient noise, multiple speakers, African names and locations, with over 1000 speakers from 4 countries–Nigeria, South Africa, Kenya, Ghana
an unreleased African accented dataset of court hearings rich in legal terminology, proper nouns and latin words
a multi-country multi-accent dataset with 2+ hrs of read/scripted and conversational speech from Nigeria, South Africa, Kenya, Ghana, Rwanda, and North Africa (Egypt, Morocco, Algeria, etc)
general purpose cross-domain speech recognition model
streaming model optimised for medical conversations
biometric voice-based authentication tuned for African accents and languages to combat fraud
first production pan-African accented speech synthesis model supporting 30 African accents spoken across 10+ countries
SOTA automatic speech translation models on 20 African languages
general purpose cross-domain speech recognition model
streaming model optimised for medical conversations
biometric voice-based authentication tuned for African accents and languages to combat fraud
first production pan-African accented speech synthesis model supporting 30 African accents spoken across 10+ countries
SOTA automatic speech translation models on 20 African languages
The first production pan-African accented speech synthesis model with 54 personas from 13 countries, representing 34 African accents with female and male voices.
Spoof-aware Voice authentication and security, tuned for African voices, accents and languages to combat fraud and deepfakes
Best-in-class single-speaker clinical speech recognition for African Accents
“Our doctors are a lot faster now with documentation, about 57 seconds per note using Intron’s clinical dictation. Intron is on par or better than other transcription tools we’ve tested. Even the errors are negligible.” – Program Lead, EHA Clinics. By harnessing the power of AI, EHA Clinics aspires to set new standards in healthcare delivery, ensuring that every patient receives timely and personalized care while paving the way for future advancements in the medical field.
Health
Best-in-class clinical speech recognition on Africa accents
“Scaling up our HMIS nationally made it clear several doctors couldn’t cope because keyboards made them much slower. We tapped Intron to eliminate this bottleneck, increase adoption, and help with medical translation for doctors who don’t speak Kinyarwanda.” By integrating this technology, the Ministry seeks to empower healthcare providers, allowing them to focus more on patient care rather than administrative tasks. The anticipated impact includes improved patient outcomes, increased efficiency in medical documentation, and the ability to bridge language barriers in a multilingual society.
Health
Best-in-class single-speaker clinical speech recognition for African Accents
Dr. Abdulahi, a radiologist at the hospital, shares his experience with the speech-to-text application, stating, “I really don’t like typing much honestly but with the speech to text app it has changed everything for me and made it easier with me just having to edit just a little, instead of typing a full page.” By minimizing manual input, AKTH is positioning itself to harness the full potential of AI, aiming for a future where healthcare delivery is more efficient and effective.
Health
Multi-speaker speech recognition and automated clinical note generation for African accents
“We’re Uganda’s largest private hospital network seeing over 64,000 patients per month. Adopting Intron’s AI technology helps reduce burdens on clinicians, reduce errors, waiting time and improve patient experience.” By leveraging this technology, the hospital aims to foster a culture of precision and efficiency in patient care. The anticipated impact includes improved accuracy in clinical records, enhanced communication among healthcare teams, and a greater capacity to focus on patient interactions.
Health
Multi-speaker speech recognition and automated clinical note generation for African accents
Dr. Brian, CMO at RUPHA emphasized “Intron’s Voice-to-text increases throughput for members on our platform especially for telehealth visits, reducing turnaround time, improving operational excellence and increasing revenue per doctor per day since physicians can see more patients in less time.” This advanced solution facilitates accurate and efficient documentation during patient consultations, ensuring that critical information is captured seamlessly.
Health
Multi-speaker speech recognition and automated clinical note generation for African accents
“We see over 3000 patients a day during our health camps, putting significant pressure on doctors to document each encounter. Intron’s conversation mode makes sure each encounter receives high quality notes, helping us meet quality and compliance requirements.” By facilitating real-time, high-quality note-taking during patient interactions, Intron helps Zuri Health maintain comprehensive records while ensuring compliance with healthcare reporting standards.
Health
Best-in-class text-to-speech with 80+ African female and male voices in 40+ accents from 13 countries
“Working with the South African government, we see an opportunity to boost engagement with the youth on reproductive health by providing voice as a conversation option since text-only interactions have inherent limitations. We’re excited to bring Intron’s voice capabilities into our AI Hub.” This initiative aims to provide a more accessible and empathetic communication channel for young people. By incorporating voice as a conversation option, Audere seeks to improve user engagement and health outcomes in reproductive health services.
Call Center
Multi-speaker speech recognition, automated summarization and agent scoring
“We’re collaborating with Intron to enhance call center service quality by providing personalized daily compliance and performance feedback to agents, a copilot on calls, improving CX excellence, and enhancing customer experience.” This technological advancement is poised to elevate customer satisfaction, boost agent retention, and streamline service delivery.
Call Center, CX Intelligence
Best-in-class ambient multi-speaker speech recognition and summarization for African accents
“Before now, we had to write down everything. It was exhausting and slow. Now, we can focus on what matters. What used to take 4+ hours now concludes in 2–3 hours. My Lord no longer has to write during proceedings. He now focuses entirely on what is being said, ensures everything is properly recorded, and we’re achieving much more in significantly less time than before.” This innovation allows judges to focus entirely on the dialogue in the courtroom, enhancing attention, accuracy, and speed. The integration of AI has significantly reduced session times, enabling more cases to be heard and expediting the delivery of justice.
Legal
Voice authentication, commands, and voice OTP/MFA, tuned for African voices to combat fraud and deepfakes
Crowdsourcing platforms have a massive problem with duplicate accounts where contributors seek to increase earnings by creating multiple accounts under pseudonyms. With advanced voice authentication tuned for African accents, we’re able to detect and mitigate the proliferation of duplicate accounts on our platform.
Biometrics
Best-in-class single-speaker and multi-speaker clinical speech recognition for African accents
“We’ve been thinking about this for a long time. Intron is meeting an important need for doctors across our group hospitals, helping improve efficiency and reduce wait times.” This innovative tool is designed to enhance communication between doctors and the hospital system, allowing for more efficient documentation and data retrieval.
Health
Multi-speaker speech recognition and automated summarization and content indexing for African accents
“It could take over 4 months to review 400 hours of user interviews given our lean team, but with fast and accurate transcription, we can quickly analyze the content of the interview, extract key concepts, organize the knowledge and derive insights much faster.” This advancement not only improves operational efficiency but also empowers organizations to make data-driven decisions more rapidly.
Content Indexing