Initial explorations in Kazakh to English statistical machine translation

Zhenisbek Assylbekov, Assulan Nurkas

Research output: Contribution to conferencePaperpeer-review


This paper presents preliminary results of developing a statistical machine
translation system from Kazakh to English. Starting with a baseline model
trained on 1.3K and then on 20K aligned sentences, we tried to cope with the complex morphology of Kazakh by applying different schemes of morphological word segmentation to the training and test data. Morphological segmentation appears to benefit our system: our best segmentation scheme achieved a 28% reduction of outof-vocabulary rate and 2.7 point BLEU improvement above the baseline.
Original languageEnglish
Number of pages16
Publication statusPublished - Dec 9 2014
EventThe First Italian Conference on Computational Linguistics - Pisa, Italy
Duration: Dec 8 2014Dec 9 2014


ConferenceThe First Italian Conference on Computational Linguistics
Abbreviated titleCLiC-it 2014
Internet address

Cite this