Data-driven morphological analysis and disambiguation for Kazakh

Olzhas Makhambetov, Aibek Makazhanov, Islam Sabyrgaliyev, Zhandos Yessenbayev

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

We propose a method for morphological analysis and disambiguation for Kazakh language that accounts for both inflectional and derivational morphology, including not fully productive derivation. The method is data-driven and does not require manually generated rules. We leverage so called “transition chains” that help pruning false segmentations, while keeping correct ones. At the disambiguation step we use a standard HMM-based approach. Evaluating our method against open source solutions on several data sets, we show that it achieves better or on par performance. We also provide an extensive error analysis that sheds light on common problems of the morphological disambiguation of the language.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 16th International Conference, CICLing 2015, Proceedings
EditorsAlexander Gelbukh
PublisherSpringer Verlag
Pages151-163
Number of pages13
ISBN (Print)9783319181103
DOIs
Publication statusPublished - Jan 1 2015
Event16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015 - Cairo, Egypt
Duration: Apr 14 2015Apr 20 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9041
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other16th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2015
CountryEgypt
CityCairo
Period4/14/154/20/15

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Data-driven morphological analysis and disambiguation for Kazakh'. Together they form a unique fingerprint.

Cite this