Intron AfriSpeech-200
Automatic Speech
Recognition Challenge

Can you create an automatic speech recognition(ASR) model for African accents, for use by doctors?

$5,000 USD

AfriSpeesh-200 ASR Challenge

African hospitals have some of the lowest doctor-patient ratios in the world. At very busy clinics, doctors could see over 30 patients a day without any of the productivity-boosting tools available to their colleagues in developed countries. Clinical speech-to-text is ubiquitous in the developed world but virtually absent across African hospitals. This competition seeks to create pan-African English ASR models for healthcare, expanding clinical speech recognition access to African clinics to help alleviate the burden of daily clinical documentation.

This is the largest and most diverse open-source accented speech dataset for clinical and general domain ASR in Africa, covering 13 countries, 120 different African accents, 2463 unique speakers, 52% female; featuring 67,577 audio clips, and 200 hours of audio.

African hospitals have some of the lowest doctor-patient ratios in the world. At very busy clinics, doctors could see over 30 patients a day without any of the productivity-boosting tools available to their colleagues in developed countries.

Clinical speech-to-text is ubiquitous in the developed world but virtually absent across African hospitals. This competition seeks to create pan-African English ASR models for healthcare, expanding clinical speech recognition access to African clinics to help alleviate the burden of daily clinical documentation.

This is the largest and most diverse open-source accented speech dataset for clinical and general domain ASR in Africa, covering 13 countries, 120 different African accents, 2463 unique speakers, 52% female; featuring 67,577 audio clips, and 200 hours of audio.

See More Collapse
NLP/ASR Research

Featured Research and Publications

AfriNames: Most ASR models “butcher” African Names

Abstract
Useful conversational agents must accurately capture named entities to minimize error for downstream tasks, for example, asking a voice assistant to play a track from a certain artist, initiating navigation to a specific location, or documenting a laboratory result for a patient. However, where named entities such as “Ukachukwu” (Igbo), “Lakicia” (Swahili), or “Ingabire” (Rwandan) are spoken, automatic speech recognition (ASR) models’ performance degrades significantly, propagating errors to downstream systems. We model this problem as a distribution shift and demonstrate that such model bias can be mitigated through multilingual pre-training, intelligent data augmentation strategies to increase the representation of African-named entities, and fine-tuning multilingual ASR models on multiple African accents. The resulting fine-tuned models show an 81.5% relative WER improvement comp

AfriSpeech-200: Pan-African accented speech dataset for clinical and general domain ASR

Abstract
Africa has a very low doctor-to-patient ratio. At very busy clinics, doctors could see 30+ patients per day– a heavy patient burden compared with developed countries–but productivity tools such as clinical automatic speech recognition (ASR) are lacking for these overworked clinicians. However, clinical ASR is mature, even ubiquitous, in developed nations, and clinician-reported performance of commercial clinical ASR systems is generally satisfactory. Furthermore, the recent performance of general domain ASR is approaching human accuracy. However, several gaps exist. Several publications have highlighted racial bias with speech-to-text algorithms and performance on minority accents lags significantly. To our knowledge, there is no publicly available research or benchmark on accented African clinical ASR, and speech data is non-existent for the majority of African accents. We release AfriSpeech, 200hrs of Pan-African speeches, 67,577 clips from 2,463 unique speakers, across 120 indigenous accents from 13 countries for clinical and general domain ASR, a benchmark test set, with publicly available pre-trained models with SOTA performance on the AfriSpeech benchmark.

Advancing African Accented Clinical Speech Recognition with Generative and Discriminative Multitask Supervision

Abstract
Although automatic speech recognition (ASR) could be considered a solved problem in the context of high-resource languages like English, ASR performance for accented speech is significantly inferior. The recent emergence of large pretrained ASR models has facilitated multiple transfer learning and domain adaptation efforts, in which performant general-purpose ASR models are fine-tuned for specific domains, such as clinical or accented speech. However, African accented clinical speech recognition remains largely unexplored. We propose a semantically aligned, domain-specific multitask learning framework (generative and discriminative) and demonstrate empirically that semantically aligned, multitask learning enhances ASR, outperforming the single-task architecture by 2.5% (relative). We discover that the generative multitask design improves generalization to unseen accents, while the discriminative multitask approach improves clinical ASR for majority and minority accents.

AfriNames: Most ASR models “butcher” African Names

Abstract
Useful conversational agents must accurately capture named entities to minimize error for downstream tasks, for example, asking a voice assistant to play a track from a certain artist, initiating navigation to a specific location, or documenting a diagnosis result for a specific patient. However, where named entities such as ”Ukachukwu” (Igbo), ”Lakicia” (Swahili), or ”Ingabire” (Rwandan) are spoken, automatic speech recognition (ASR) models’ performance degrades significantly, propagating errors to downstream systems. We model this problem as a distribution shift and demonstrate that such model bias can be mitigated through multilingual pre-training, intelligent data augmentation strategies to increase the representation of African-named entities, and fine-tuning multilingual ASR models on multi- ple African accents. The resulting fine-tuned models show an 86.4% relative improvement compared with the baseline on samples with African-named entities.

Our Partners