Advancing African Accented Clinical Speech Recognition with Generative and Discriminative Multitask Supervision

Although automatic speech recognition (ASR) could be considered a solved problem in the context of high-resource languages like English, ASR performance for accented speech is significantly inferior. The recent emergence of large pretrained ASR models has facilitated multiple transfer learning and domain adaptation efforts, in which performant general-purpose ASR models are fine-tuned for specific domains, such as clinical or accented speech. However, African accented clinical speech recognition remains largely unexplored. We propose a semantically aligned, domain-specific multitask learning framework (generative and discriminative) and demonstrate empirically that semantically aligned, multitask learning enhances ASR, outperforming the single-task architecture by 2.5% (relative). We discover that the generative multitask design improves generalization to unseen accents, while the discriminative multitask approach improves clinical ASR for majority and minority accents.

Advancing African Accented Clinical Speech Recognition with Generative and Discriminative Multitask Supervision

Let’s Get in Touch

Platform

What’s New

About

Advancing African Accented Clinical Speech Recognition with Generative and Discriminative Multitask Supervision

Let’s Get in Touch

Platform

What’s New

About

The Future of African Voice AI