Initial explorations in Kazakh to English statistical machine translation

Zhenisbek Assylbekov, Assulan Nurkas

Research output: Contribution to conferencePaper

Abstract

This paper presents preliminary results of developing a statistical machine
translation system from Kazakh to English. Starting with a baseline model
trained on 1.3K and then on 20K aligned sentences, we tried to cope with the complex morphology of Kazakh by applying different schemes of morphological word segmentation to the training and test data. Morphological segmentation appears to benefit our system: our best segmentation scheme achieved a 28% reduction of outof-vocabulary rate and 2.7 point BLEU improvement above the baseline.
Original languageEnglish
Pages12
Number of pages16
Publication statusPublished - Dec 9 2014
EventThe First Italian Conference on Computational Linguistics - Pisa, Italy
Duration: Dec 8 2014Dec 9 2014
http://www.fileli.unipi.it/projects/clic/en/

Conference

ConferenceThe First Italian Conference on Computational Linguistics
Abbreviated titleCLiC-it 2014
CountryItaly
CityPisa
Period12/8/1412/9/14
Internet address

Cite this

Assylbekov, Z., & Nurkas, A. (2014). Initial explorations in Kazakh to English statistical machine translation. 12. Paper presented at The First Italian Conference on Computational Linguistics, Pisa, Italy.

Initial explorations in Kazakh to English statistical machine translation. / Assylbekov, Zhenisbek; Nurkas, Assulan.

2014. 12 Paper presented at The First Italian Conference on Computational Linguistics, Pisa, Italy.

Research output: Contribution to conferencePaper

Assylbekov, Z & Nurkas, A 2014, 'Initial explorations in Kazakh to English statistical machine translation' Paper presented at The First Italian Conference on Computational Linguistics, Pisa, Italy, 12/8/14 - 12/9/14, pp. 12.
Assylbekov Z, Nurkas A. Initial explorations in Kazakh to English statistical machine translation. 2014. Paper presented at The First Italian Conference on Computational Linguistics, Pisa, Italy.
Assylbekov, Zhenisbek ; Nurkas, Assulan. / Initial explorations in Kazakh to English statistical machine translation. Paper presented at The First Italian Conference on Computational Linguistics, Pisa, Italy.16 p.
@conference{2d3600dfab1c4161b1c0898d5259b0de,
title = "Initial explorations in Kazakh to English statistical machine translation",
abstract = "This paper presents preliminary results of developing a statistical machinetranslation system from Kazakh to English. Starting with a baseline modeltrained on 1.3K and then on 20K aligned sentences, we tried to cope with the complex morphology of Kazakh by applying different schemes of morphological word segmentation to the training and test data. Morphological segmentation appears to benefit our system: our best segmentation scheme achieved a 28{\%} reduction of outof-vocabulary rate and 2.7 point BLEU improvement above the baseline.",
author = "Zhenisbek Assylbekov and Assulan Nurkas",
year = "2014",
month = "12",
day = "9",
language = "English",
pages = "12",
note = "The First Italian Conference on Computational Linguistics, CLiC-it 2014 ; Conference date: 08-12-2014 Through 09-12-2014",
url = "http://www.fileli.unipi.it/projects/clic/en/",

}

TY - CONF

T1 - Initial explorations in Kazakh to English statistical machine translation

AU - Assylbekov, Zhenisbek

AU - Nurkas, Assulan

PY - 2014/12/9

Y1 - 2014/12/9

N2 - This paper presents preliminary results of developing a statistical machinetranslation system from Kazakh to English. Starting with a baseline modeltrained on 1.3K and then on 20K aligned sentences, we tried to cope with the complex morphology of Kazakh by applying different schemes of morphological word segmentation to the training and test data. Morphological segmentation appears to benefit our system: our best segmentation scheme achieved a 28% reduction of outof-vocabulary rate and 2.7 point BLEU improvement above the baseline.

AB - This paper presents preliminary results of developing a statistical machinetranslation system from Kazakh to English. Starting with a baseline modeltrained on 1.3K and then on 20K aligned sentences, we tried to cope with the complex morphology of Kazakh by applying different schemes of morphological word segmentation to the training and test data. Morphological segmentation appears to benefit our system: our best segmentation scheme achieved a 28% reduction of outof-vocabulary rate and 2.7 point BLEU improvement above the baseline.

M3 - Paper

SP - 12

ER -