Best-in-class speech recognition
and text-to-speech model
for African accent
Beats OpenAI, Google, AWS,
Azure across multiple benchmarks
Today, we’re launching Sahara — a breakthrough family of speech recognition models trained on thousands of hours of proprietary audio from 18,000+ speakers, across 300+ non-native English accents from 30+ African countries. Powered by our proprietary AccentMix™ algorithm, Sahara doesn’t just keep up — it outperforms OpenAI’s Whisper, GPT-4o Transcribe, Nvidia Canary, Google Speech-to-Text, AWS Transcribe, and Azure Speech across the board.
Today, we’re launching Sahara — a breakthrough family of speech recognition models trained on thousands of hours of proprietary audio from 18,000+ speakers, across 300+ non-native English accents from 30+ African countries. Powered by our proprietary AccentMix™ algorithm, Sahara doesn’t just keep up — it outperforms OpenAI’s Whisper, GPT-4o Transcribe, Nvidia Canary, Google Speech-to-Text, AWS Transcribe, and Azure Speech across the board.
Today, we’re launching Sahara — a breakthrough family of speech recognition models trained on thousands of hours of proprietary audio from 18,000+ speakers, across 300+ non-native English accents from 30+ African countries. Powered by our proprietary AccentMix™ algorithm, Sahara doesn’t just keep up — it outperforms OpenAI’s Whisper, GPT-4o Transcribe, Nvidia Canary, Google Speech-to-Text, AWS Transcribe, and Azure Speech across the board.
Medical
Here’s the kicker: we’re a tiny, seed-stage startup. We don’t have the luxury of bottomless compute or internet-scale data. So we had to do things differently — leaner, smarter, and relentlessly focused on real-world performance.
Ceo of Company
Medical
Here’s the kicker: we’re a tiny, seed-stage startup. We don’t have the luxury of bottomless compute or internet-scale data. So we had to do things differently — leaner, smarter, and relentlessly focused on real-world performance.
Ceo of Company
Medical
Here’s the kicker: we’re a tiny, seed-stage startup. We don’t have the luxury of bottomless compute or internet-scale data. So we had to do things differently — leaner, smarter, and relentlessly focused on real-world performance.
Ceo of Company
Medical
Here’s the kicker: we’re a tiny, seed-stage startup. We don’t have the luxury of bottomless compute or internet-scale data. So we had to do things differently — leaner, smarter, and relentlessly focused on real-world performance.
Ceo of Company
Word Error Rate (WER) is a common way to measure how accurate speech recognition systems are. It compares what the system heard to what was actually said. It measures the model’s ERROR, so lower is better. It divides the number of word-level errors by the total number of words
Word Error Rate (WER) is a common way to measure how accurate speech recognition systems are. It compares what the system heard to what was actually said. It measures the model’s ERROR, so lower is better. It divides the number of word-level errors by the total number of words
Why it matters: WER tells us how reliable a speech-to-text system is. A lower WER means fewer mistakes and better performance—critical for areas like healthcare, legal, and customer service.
Strengths: Simple to calculate, Easy to compare different systems, Works across languages
Weaknesses: Treats all errors equally—even if some are more harmful (e.g., “don’t take” vs “take”); Even single character errors like carrot vs carot get full penalty, so it can be overly harsh and punitive; Doesn’t consider punctuation or context; May not reflect user satisfaction or usefulness
WER = (Substitutions +
Insertions + Deletions) ÷ Total Words
Substitutions: wrong words
Insertions: extra words
Deletions: missing words
Spoken: “Take your medicine daily”
Transcript: “Take your message daily”
WER = 1 error ÷ 4 words = 25%
Punching way above models 2-3x its size, Sahara demonstrates superior performance on Accented English speech in a pan-African context across multiple industries (health, finance, legal, academia, etc) and domains with impressive robustness to background noise, intonations, and domain-specific vocabulary.
general purpose cross-domain speech recognition model
streaming model optimised for medical conversations
biometric voice-based authentication tuned for African accents and languages to combat fraud
general purpose cross-domain speech recognition model
streaming model optimised for medical conversations
Spoof-aware Voice authentication and security, tuned for African voices, accents and languages to combat fraud and deepfakes