Noise-Robust Multilingual Speech Recognition and the Tatar Speech Corpus

Saida Mussakhojayeva, Rinat Gilmullin, Bulat Khakimov, Mansur Galimov, Daniil Orel, Adal Adilbekov, Huseyin Atakan Varol

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

After focusing on individual languages for a long time, multilingual automatic speech recognition has recently become an active area of research. For instance, Whisper by OpenAI is capable of recognizing speech in 99 languages. However, the performance of Whisper is significantly lower for low-resource languages than for high-resource ones. In this work, we aim to address this and present a fine-Tuning strategy for the pre-Trained Whisper model so that its performance is improved for a low-resource language family while maintaining performance for a set of high-resource languages. Specifically, our Soyle model exhibited high performance for both the Turkic language family (11 languages) and the official languages of the United Nations. Our work also presents the first large open-source speech corpus for the Tatar language. We demonstrate that speech recognition performance for Tatar improves with the model trained using the new Tatar Speech Corpus (TatSC). Our model is also trained to be noise-robust. We open-source our model and TatSC to encourage further research. We envision that our fine-Tuning approach will guide the creation multilingual speech recognition models for other low-resource language families.

Original languageEnglish
Title of host publication6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages732-737
Number of pages6
ISBN (Electronic)9798350344349
DOIs
Publication statusPublished - 2024
Event6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024 - Osaka, Japan
Duration: Feb 19 2024Feb 22 2024

Publication series

Name6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024

Conference

Conference6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024
Country/TerritoryJapan
CityOsaka
Period2/19/242/22/24

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Safety, Risk, Reliability and Quality
  • Health Informatics

Fingerprint

Dive into the research topics of 'Noise-Robust Multilingual Speech Recognition and the Tatar Speech Corpus'. Together they form a unique fingerprint.

Cite this