Data-driven morphological analysis and disambiguation for Kazakh

Olzhas Makhambetov, Aibek Makazhanov, Islam Sabyrgaliyev, Zhandos Yessenbayev

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

We propose a method for morphological analysis and disambiguation for Kazakh language that accounts for both inflectional and derivational morphology, including not fully productive derivation. The method is data-driven and does not require manually generated rules. We leverage so called “transition chains” that help pruning false segmentations, while keeping correct ones. At the disambiguation step we use a standard HMM-based approach. Evaluating our method against open source solutions on several data sets, we show that it achieves better or on par performance. We also provide an extensive error analysis that sheds light on common problems of the morphological disambiguation of the language.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages151-163
Number of pages13
Volume9041
ISBN (Print)9783319181103
DOIs
Publication statusPublished - 2015
Event16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015 - Cairo, Egypt
Duration: Apr 14 2015Apr 20 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9041
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015
CountryEgypt
CityCairo
Period4/14/154/20/15

Fingerprint

Morphological Analysis
Data-driven
Error analysis
Pruning
Error Analysis
Leverage
Open Source
Segmentation
Language

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Makhambetov, O., Makazhanov, A., Sabyrgaliyev, I., & Yessenbayev, Z. (2015). Data-driven morphological analysis and disambiguation for Kazakh. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9041, pp. 151-163). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9041). Springer Verlag. https://doi.org/10.1007/978-3-319-18111-0_12

Data-driven morphological analysis and disambiguation for Kazakh. / Makhambetov, Olzhas; Makazhanov, Aibek; Sabyrgaliyev, Islam; Yessenbayev, Zhandos.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9041 Springer Verlag, 2015. p. 151-163 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9041).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Makhambetov, O, Makazhanov, A, Sabyrgaliyev, I & Yessenbayev, Z 2015, Data-driven morphological analysis and disambiguation for Kazakh. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 9041, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9041, Springer Verlag, pp. 151-163, 16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015, Cairo, Egypt, 4/14/15. https://doi.org/10.1007/978-3-319-18111-0_12
Makhambetov O, Makazhanov A, Sabyrgaliyev I, Yessenbayev Z. Data-driven morphological analysis and disambiguation for Kazakh. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9041. Springer Verlag. 2015. p. 151-163. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-18111-0_12
Makhambetov, Olzhas ; Makazhanov, Aibek ; Sabyrgaliyev, Islam ; Yessenbayev, Zhandos. / Data-driven morphological analysis and disambiguation for Kazakh. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9041 Springer Verlag, 2015. pp. 151-163 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{fbc37953d9724064bccf1b7ce029af2e,
title = "Data-driven morphological analysis and disambiguation for Kazakh",
abstract = "We propose a method for morphological analysis and disambiguation for Kazakh language that accounts for both inflectional and derivational morphology, including not fully productive derivation. The method is data-driven and does not require manually generated rules. We leverage so called “transition chains” that help pruning false segmentations, while keeping correct ones. At the disambiguation step we use a standard HMM-based approach. Evaluating our method against open source solutions on several data sets, we show that it achieves better or on par performance. We also provide an extensive error analysis that sheds light on common problems of the morphological disambiguation of the language.",
author = "Olzhas Makhambetov and Aibek Makazhanov and Islam Sabyrgaliyev and Zhandos Yessenbayev",
year = "2015",
doi = "10.1007/978-3-319-18111-0_12",
language = "English",
isbn = "9783319181103",
volume = "9041",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "151--163",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
address = "Germany",

}

TY - GEN

T1 - Data-driven morphological analysis and disambiguation for Kazakh

AU - Makhambetov, Olzhas

AU - Makazhanov, Aibek

AU - Sabyrgaliyev, Islam

AU - Yessenbayev, Zhandos

PY - 2015

Y1 - 2015

N2 - We propose a method for morphological analysis and disambiguation for Kazakh language that accounts for both inflectional and derivational morphology, including not fully productive derivation. The method is data-driven and does not require manually generated rules. We leverage so called “transition chains” that help pruning false segmentations, while keeping correct ones. At the disambiguation step we use a standard HMM-based approach. Evaluating our method against open source solutions on several data sets, we show that it achieves better or on par performance. We also provide an extensive error analysis that sheds light on common problems of the morphological disambiguation of the language.

AB - We propose a method for morphological analysis and disambiguation for Kazakh language that accounts for both inflectional and derivational morphology, including not fully productive derivation. The method is data-driven and does not require manually generated rules. We leverage so called “transition chains” that help pruning false segmentations, while keeping correct ones. At the disambiguation step we use a standard HMM-based approach. Evaluating our method against open source solutions on several data sets, we show that it achieves better or on par performance. We also provide an extensive error analysis that sheds light on common problems of the morphological disambiguation of the language.

UR - http://www.scopus.com/inward/record.url?scp=84942574744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942574744&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-18111-0_12

DO - 10.1007/978-3-319-18111-0_12

M3 - Conference contribution

SN - 9783319181103

VL - 9041

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 151

EP - 163

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -