Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the fast-growing world of neural networks, models trained on extensive multilingual text and speech data have shown great promise for improving the state of low-resource languages. This study focuses on the application of state-of-the-art speech recognition models, specifically Facebook’s Wav2Vec2.0 and Wav2Vec2-XLSR, to the Kazakh language. The primary objective is to evaluate the performance of these models in transcribing spoken Kazakh content. Additionally, the research explores the possibility of using data from other languages for initial training and examines whether fine-tuning the model with target language data can improve its performance. More so, this work gives insights into how effective pre-trained multilingual models are when used on low-resource languages. The fine-tuned wav2vec2.0-XLSR model demonstrated impressive results, achieving a character error rate (CER) of 1.9 and a word error rate (WER) of 8.9 when tested against the test set of the Kazcorpus dataset. These findings may help create robustness in Automatic Speech Recognition (ASR) systems for Kazakh which could be used for various applications such as voice-activated assistants; speech-to-text translators among others.

Original languageEnglish
Title of host publicationComputational Science and Its Applications - ICCSA 2024 - 24th International Conference, 2024, Proceedings
EditorsOsvaldo Gervasi, Beniamino Murgante, Chiara Garau, David Taniar, Ana Maria A. C. Rocha, Maria Noelia Faginas Lago
PublisherSpringer Science and Business Media Deutschland GmbH
Pages120-132
Number of pages13
ISBN (Print)9783031646072
DOIs
Publication statusPublished - 2024
Event24th International Conference on Computational Science and Its Applications, ICCSA 2024 - Hanoi, Viet Nam
Duration: Jul 1 2024Jul 4 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14814 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Computational Science and Its Applications, ICCSA 2024
Country/TerritoryViet Nam
CityHanoi
Period7/1/247/4/24

Keywords

  • Automatic speech recognition
  • Kazakh language
  • Pre-trained transformer models
  • Speech representation models
  • Wav2Vec 2.0
  • Wav2Vec2-XLSR

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study'. Together they form a unique fingerprint.

Cite this