TY - GEN
T1 - Noise-Robust Multilingual Speech Recognition and the Tatar Speech Corpus
AU - Mussakhojayeva, Saida
AU - Gilmullin, Rinat
AU - Khakimov, Bulat
AU - Galimov, Mansur
AU - Orel, Daniil
AU - Adilbekov, Adal
AU - Varol, Huseyin Atakan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - After focusing on individual languages for a long time, multilingual automatic speech recognition has recently become an active area of research. For instance, Whisper by OpenAI is capable of recognizing speech in 99 languages. However, the performance of Whisper is significantly lower for low-resource languages than for high-resource ones. In this work, we aim to address this and present a fine-Tuning strategy for the pre-Trained Whisper model so that its performance is improved for a low-resource language family while maintaining performance for a set of high-resource languages. Specifically, our Soyle model exhibited high performance for both the Turkic language family (11 languages) and the official languages of the United Nations. Our work also presents the first large open-source speech corpus for the Tatar language. We demonstrate that speech recognition performance for Tatar improves with the model trained using the new Tatar Speech Corpus (TatSC). Our model is also trained to be noise-robust. We open-source our model and TatSC to encourage further research. We envision that our fine-Tuning approach will guide the creation multilingual speech recognition models for other low-resource language families.
AB - After focusing on individual languages for a long time, multilingual automatic speech recognition has recently become an active area of research. For instance, Whisper by OpenAI is capable of recognizing speech in 99 languages. However, the performance of Whisper is significantly lower for low-resource languages than for high-resource ones. In this work, we aim to address this and present a fine-Tuning strategy for the pre-Trained Whisper model so that its performance is improved for a low-resource language family while maintaining performance for a set of high-resource languages. Specifically, our Soyle model exhibited high performance for both the Turkic language family (11 languages) and the official languages of the United Nations. Our work also presents the first large open-source speech corpus for the Tatar language. We demonstrate that speech recognition performance for Tatar improves with the model trained using the new Tatar Speech Corpus (TatSC). Our model is also trained to be noise-robust. We open-source our model and TatSC to encourage further research. We envision that our fine-Tuning approach will guide the creation multilingual speech recognition models for other low-resource language families.
UR - http://www.scopus.com/inward/record.url?scp=85189929266&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189929266&partnerID=8YFLogxK
U2 - 10.1109/ICAIIC60209.2024.10463419
DO - 10.1109/ICAIIC60209.2024.10463419
M3 - Conference contribution
AN - SCOPUS:85189929266
T3 - 6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024
SP - 732
EP - 737
BT - 6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024
Y2 - 19 February 2024 through 22 February 2024
ER -