Implementation of the intelligent voice system for Kazakh

Zh Yessenbayev, N. Saparkhojayev, T. Tibeyev

Research output: Contribution to journalArticle

Abstract

Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.

Original languageEnglish
Article number012043
JournalJournal of Physics: Conference Series
Volume495
Issue number1
DOIs
Publication statusPublished - 2014

Fingerprint

speech recognition
graphical user interface
synthesis
support systems
engines
Japan
education
modules
communication
operators
cells

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Implementation of the intelligent voice system for Kazakh. / Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

In: Journal of Physics: Conference Series, Vol. 495, No. 1, 012043, 2014.

Research output: Contribution to journalArticle

@article{93ed4e3f40214f188fa1e73a64bb09d5,
title = "Implementation of the intelligent voice system for Kazakh",
abstract = "Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1{\%} WER on isolated word recognition and 6.9{\%} WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.",
author = "Zh Yessenbayev and N. Saparkhojayev and T. Tibeyev",
year = "2014",
doi = "10.1088/1742-6596/495/1/012043",
language = "English",
volume = "495",
journal = "Journal of Physics: Conference Series",
issn = "1742-6588",
publisher = "IOP Publishing Ltd.",
number = "1",

}

TY - JOUR

T1 - Implementation of the intelligent voice system for Kazakh

AU - Yessenbayev, Zh

AU - Saparkhojayev, N.

AU - Tibeyev, T.

PY - 2014

Y1 - 2014

N2 - Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.

AB - Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.

UR - http://www.scopus.com/inward/record.url?scp=84902440958&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902440958&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/495/1/012043

DO - 10.1088/1742-6596/495/1/012043

M3 - Article

AN - SCOPUS:84902440958

VL - 495

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

SN - 1742-6588

IS - 1

M1 - 012043

ER -