Experiments with Russian to Kazakh sentence alignment

Research output: Contribution to conferencePaper

Abstract

Sentence alignment is the final step in building parallel corpora, which arguably has the greatest impact on the quality of a resulting corpus and the accuracy of machine translation systems that use it for training. However, the quality of sentence alignment itself depends on a number of factors. In this paper we
investigate the impact of several data processing techniques on the quality of sentence alignment. We develop and use a number of automatic evaluation metrics, and provide empirical evidence that application of
all of the considered data processing techniques yields bitexts with the lowest ratio of noise and the highest ratio of parallel sentences.
Original languageEnglish
Publication statusPublished - 2016
EventThe 4-th International Conference on Computer Processing of Turkic Languages - Bishkek, Kyrgyzstan
Duration: Aug 23 2016Aug 25 2016
http://turklang.kz/en/index.php

Conference

ConferenceThe 4-th International Conference on Computer Processing of Turkic Languages
Abbreviated titleTurkLang 2016
CountryKyrgyzstan
CityBishkek
Period8/23/168/25/16
Internet address

Fingerprint

Experiments

Cite this

Assylbekov, Z., Makazhanov, A., & Myrzakhmetov, B. (2016). Experiments with Russian to Kazakh sentence alignment. Paper presented at The 4-th International Conference on Computer Processing of Turkic Languages, Bishkek, Kyrgyzstan.

Experiments with Russian to Kazakh sentence alignment. / Assylbekov, Zhenisbek; Makazhanov, Aibek; Myrzakhmetov, Bagdat.

2016. Paper presented at The 4-th International Conference on Computer Processing of Turkic Languages, Bishkek, Kyrgyzstan.

Research output: Contribution to conferencePaper

Assylbekov, Z, Makazhanov, A & Myrzakhmetov, B 2016, 'Experiments with Russian to Kazakh sentence alignment' Paper presented at The 4-th International Conference on Computer Processing of Turkic Languages, Bishkek, Kyrgyzstan, 8/23/16 - 8/25/16, .
Assylbekov Z, Makazhanov A, Myrzakhmetov B. Experiments with Russian to Kazakh sentence alignment. 2016. Paper presented at The 4-th International Conference on Computer Processing of Turkic Languages, Bishkek, Kyrgyzstan.
Assylbekov, Zhenisbek ; Makazhanov, Aibek ; Myrzakhmetov, Bagdat. / Experiments with Russian to Kazakh sentence alignment. Paper presented at The 4-th International Conference on Computer Processing of Turkic Languages, Bishkek, Kyrgyzstan.
@conference{bcc7a22208d54d218e945b2da551ddba,
title = "Experiments with Russian to Kazakh sentence alignment",
abstract = "Sentence alignment is the final step in building parallel corpora, which arguably has the greatest impact on the quality of a resulting corpus and the accuracy of machine translation systems that use it for training. However, the quality of sentence alignment itself depends on a number of factors. In this paper weinvestigate the impact of several data processing techniques on the quality of sentence alignment. We develop and use a number of automatic evaluation metrics, and provide empirical evidence that application ofall of the considered data processing techniques yields bitexts with the lowest ratio of noise and the highest ratio of parallel sentences.",
author = "Zhenisbek Assylbekov and Aibek Makazhanov and Bagdat Myrzakhmetov",
year = "2016",
language = "English",
note = "The 4-th International Conference on Computer Processing of Turkic Languages, TurkLang 2016 ; Conference date: 23-08-2016 Through 25-08-2016",
url = "http://turklang.kz/en/index.php",

}

TY - CONF

T1 - Experiments with Russian to Kazakh sentence alignment

AU - Assylbekov, Zhenisbek

AU - Makazhanov, Aibek

AU - Myrzakhmetov, Bagdat

PY - 2016

Y1 - 2016

N2 - Sentence alignment is the final step in building parallel corpora, which arguably has the greatest impact on the quality of a resulting corpus and the accuracy of machine translation systems that use it for training. However, the quality of sentence alignment itself depends on a number of factors. In this paper weinvestigate the impact of several data processing techniques on the quality of sentence alignment. We develop and use a number of automatic evaluation metrics, and provide empirical evidence that application ofall of the considered data processing techniques yields bitexts with the lowest ratio of noise and the highest ratio of parallel sentences.

AB - Sentence alignment is the final step in building parallel corpora, which arguably has the greatest impact on the quality of a resulting corpus and the accuracy of machine translation systems that use it for training. However, the quality of sentence alignment itself depends on a number of factors. In this paper weinvestigate the impact of several data processing techniques on the quality of sentence alignment. We develop and use a number of automatic evaluation metrics, and provide empirical evidence that application ofall of the considered data processing techniques yields bitexts with the lowest ratio of noise and the highest ratio of parallel sentences.

M3 - Paper

ER -