Over the past decades automatic speech recognition has made remarkable advances, in both theoretical and practical aspects. Evolution of research in this field has been proceeding from the recognition of individual sounds and phonemes to the recognition of continuous and mixed speech, including tasks of automatic transcription of broadcast news and telephone conversations. Despite the high performance of continuous speech recognition systems, which makes up to 95%, the performance of phoneme recognition systems remains below 85%. However, phoneme recognition is widely used in a number of applications, such as spoken term detection, language identification, speaker identification and others. The paper presents the results of the experiments on continuous Kazakh speech recognition using different phoneme sets and alternative phonetic transcriptions. This study was instigated by the fact that in modern Kazakh linguistics there is no common agreement about the phonetic system of the Kazakh language, while the list of phonemes and their number noticeably vary in different textbooks. Therefore, we aimed our experiments to study the impact of the phonetic system of the language, its orthoepic rules and the corresponding phonetic transcriptions on the performance of the phoneme recognition systems, which are the initial stage in the general systems of continuous speech recognition. The following 6 systems of phonetic transcription have been considered and tested in our study. The fi rst one is a project of the new Kazakh alphabet and a set of spelling rules proposed by Prof. A. Sharipbay. The second system is a set of orthoepic rules for the actual Kazakh Cyrillic alphabet, introduced by Kazakh linguists – the authors of the Kazakh “Orthoepical Dictionary”. The third one of the systems considered is a phonetic system and a set of empirical transcription rules used by the authors of this work in their studies. The fourth variant is based on the actual Kazakh Cyrillic alphabet without taking into account any orthoepic rules, i.e. a transcription system in which one letter corresponds to one phoneme. The remaining two systems are variations or combinations of these systems mentioned above. In total, three series of experiments were conducted: word-based recognition and two series of phone-based recognition. The latter two differ in test sets. Word-based experiments did not reveal any special differences in the recognition performance among the systems studied, which is due to the strong impact of the language model on the decoding process. On the contrary, phone-based experiments showed that: 1) the existing orthoepic rules are not fully adequate to the actual sounding of Kazakh speech; 2) the existing phonetic system of the Kazakh language can be optimized by removing some phonemes. The speech recognition system is implemented using the Kaldi platform. Despite the fact that the present work is a preliminary study, on the whole, the presented experimental results make it possible to evaluate the adequacy of considered phonetic transcriptions to the actual sounding of Kazakh speech, and can be of particular interest in view of the forthcoming transformation of the Kazakh writing system.
|Title of host publication||Proceedings of the 5th International Conference on Turkic Languages Processing (TurkLang 2017)|
|Place of Publication||Kazan, Tatarstan|
|Number of pages||129|
|Publication status||Published - Oct 2017|