Initial Experiments on Russian to Kazakh SMT

Research output: Contribution to journalArticle

Abstract

We present our initial experiments on Russian to Kazakh phrase-based statistical machine translation. Following a common approach to SMT between morphologically rich languages, we employ morphological processing techniques. Namely, for our initial experiments, we perform source-side lemmatization. Given a rather humble-sized parallel corpus at hand, we also put some effort in data cleaning and investigate the impact of data quality vs. quantity trade off on the overall performance. Although our experiments mostly focus on source side preprocessing we achieve a substantial, statistically significant improvement over the baseline that operates on raw, unprocessed data.
Original languageUndefined/Unknown
Pages (from-to)153-160
Number of pages8
JournalResearch in Computing Science
Volume117
Publication statusPublished - 2016

Cite this

Initial Experiments on Russian to Kazakh SMT. / Myrzakhmetov, Bagdat; Makazhanov, Aibek.

In: Research in Computing Science, Vol. 117, 2016, p. 153-160.

Research output: Contribution to journalArticle

@article{8f3809485da24111b67b5902343ccb55,
title = "Initial Experiments on Russian to Kazakh SMT",
abstract = "We present our initial experiments on Russian to Kazakh phrase-based statistical machine translation. Following a common approach to SMT between morphologically rich languages, we employ morphological processing techniques. Namely, for our initial experiments, we perform source-side lemmatization. Given a rather humble-sized parallel corpus at hand, we also put some effort in data cleaning and investigate the impact of data quality vs. quantity trade off on the overall performance. Although our experiments mostly focus on source side preprocessing we achieve a substantial, statistically significant improvement over the baseline that operates on raw, unprocessed data.",
author = "Bagdat Myrzakhmetov and Aibek Makazhanov",
year = "2016",
language = "Undefined/Unknown",
volume = "117",
pages = "153--160",
journal = "Research in Computing Science",

}

TY - JOUR

T1 - Initial Experiments on Russian to Kazakh SMT

AU - Myrzakhmetov, Bagdat

AU - Makazhanov, Aibek

PY - 2016

Y1 - 2016

N2 - We present our initial experiments on Russian to Kazakh phrase-based statistical machine translation. Following a common approach to SMT between morphologically rich languages, we employ morphological processing techniques. Namely, for our initial experiments, we perform source-side lemmatization. Given a rather humble-sized parallel corpus at hand, we also put some effort in data cleaning and investigate the impact of data quality vs. quantity trade off on the overall performance. Although our experiments mostly focus on source side preprocessing we achieve a substantial, statistically significant improvement over the baseline that operates on raw, unprocessed data.

AB - We present our initial experiments on Russian to Kazakh phrase-based statistical machine translation. Following a common approach to SMT between morphologically rich languages, we employ morphological processing techniques. Namely, for our initial experiments, we perform source-side lemmatization. Given a rather humble-sized parallel corpus at hand, we also put some effort in data cleaning and investigate the impact of data quality vs. quantity trade off on the overall performance. Although our experiments mostly focus on source side preprocessing we achieve a substantial, statistically significant improvement over the baseline that operates on raw, unprocessed data.

M3 - Article

VL - 117

SP - 153

EP - 160

JO - Research in Computing Science

JF - Research in Computing Science

ER -