Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition

Zhandos Yessenbayev, Umit Yapanel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20%. The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.

Original languageEnglish
Title of host publication8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479941209
DOIs
Publication statusPublished - 2014
Event8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Astana, Kazakhstan
Duration: Oct 15 2014Oct 17 2014

Other

Other8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014
CountryKazakhstan
CityAstana
Period10/15/1410/17/14

Fingerprint

Speech analysis
Speech recognition
Experiments

Keywords

  • Kazakh speech recognition
  • Phone recognition
  • Unsupervised speaker normalization

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Computer Networks and Communications

Cite this

Yessenbayev, Z., & Yapanel, U. (2014). Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition. In 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings [7035914] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICAICT.2014.7035914

Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition. / Yessenbayev, Zhandos; Yapanel, Umit.

8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014. 7035914.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yessenbayev, Z & Yapanel, U 2014, Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition. in 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings., 7035914, Institute of Electrical and Electronics Engineers Inc., 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014, Astana, Kazakhstan, 10/15/14. https://doi.org/10.1109/ICAICT.2014.7035914
Yessenbayev Z, Yapanel U. Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition. In 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc. 2014. 7035914 https://doi.org/10.1109/ICAICT.2014.7035914
Yessenbayev, Zhandos ; Yapanel, Umit. / Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition. 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014.
@inproceedings{8e7f469769bf4dc89dd8c212a59c5efc,
title = "Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition",
abstract = "In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20{\%}. The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.",
keywords = "Kazakh speech recognition, Phone recognition, Unsupervised speaker normalization",
author = "Zhandos Yessenbayev and Umit Yapanel",
year = "2014",
doi = "10.1109/ICAICT.2014.7035914",
language = "English",
booktitle = "8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition

AU - Yessenbayev, Zhandos

AU - Yapanel, Umit

PY - 2014

Y1 - 2014

N2 - In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20%. The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.

AB - In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20%. The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.

KW - Kazakh speech recognition

KW - Phone recognition

KW - Unsupervised speaker normalization

UR - http://www.scopus.com/inward/record.url?scp=84988240380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988240380&partnerID=8YFLogxK

U2 - 10.1109/ICAICT.2014.7035914

DO - 10.1109/ICAICT.2014.7035914

M3 - Conference contribution

AN - SCOPUS:84988240380

BT - 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -