Spoken term detection for Kazakh language

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The paper presents a spoken term detection system for Kazakh language in which significant improvements are obtained through modifying speech-to-text process used for generating word- based lattices. These lattices are indexed and used for the keyword search later. Spoken Term Detection systems quickly discover the occurrence of a term, which might be just a word or sequence of words, in a large audio set of heterogeneous speech records. The paper provides an overview of a speech-to-text and keyword search system architecture built primarily on the top of the Kaldi toolkit and expands on a few highlights. Our aim was to develop a general system pipeline which could be advanced regarding phonological and linguistic features of Kazakh language in order to detect OOV keywords.
Original languageEnglish
Title of host publicationThe 4-th International Conference on Computer Processing of Turkic Languages “TurkLang 2016”
Place of PublicationBishkek, Kyrgyz Republic
Number of pages52
Publication statusPublished - Aug 2016


  • Speech Retrieval, Lattice Indexing, Spoken Term Detection, Speech Recognition, Keyword Search


Dive into the research topics of 'Spoken term detection for Kazakh language'. Together they form a unique fingerprint.

Cite this