On certain aspects of Kazakh part-of-speech tagging

Aibek Makazhanov, Zhandos Yessenbayev, Islam Sabyrgaliyev, Anuar Sharafudinov, Olzhas Makhambetov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

We compare and discuss various approaches to the problem of part of speech (POS) tagging of texts written in Kazakh, an agglutinative and highly inflectional Turkic language. In Kazakh a single root may produce hundreds of word forms, and it is difficult, if at all possible, to label enough training data to account for a vast set of all possible word forms in the language. Thus, current state of the art statistical POS taggers may not be as effective for Kazakh as for morphologically less complex languages, e.g. English. Also the choice of a POS tag set may influence the informativeness and the accuracy of tagging.

Original languageEnglish
Title of host publication8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479941209
DOIs
Publication statusPublished - 2014
Event8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Astana, Kazakhstan
Duration: Oct 15 2014Oct 17 2014

Other

Other8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014
CountryKazakhstan
CityAstana
Period10/15/1410/17/14

Fingerprint

Labels

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Computer Networks and Communications

Cite this

Makazhanov, A., Yessenbayev, Z., Sabyrgaliyev, I., Sharafudinov, A., & Makhambetov, O. (2014). On certain aspects of Kazakh part-of-speech tagging. In 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings [7035953] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICAICT.2014.7035953

On certain aspects of Kazakh part-of-speech tagging. / Makazhanov, Aibek; Yessenbayev, Zhandos; Sabyrgaliyev, Islam; Sharafudinov, Anuar; Makhambetov, Olzhas.

8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014. 7035953.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Makazhanov, A, Yessenbayev, Z, Sabyrgaliyev, I, Sharafudinov, A & Makhambetov, O 2014, On certain aspects of Kazakh part-of-speech tagging. in 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings., 7035953, Institute of Electrical and Electronics Engineers Inc., 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014, Astana, Kazakhstan, 10/15/14. https://doi.org/10.1109/ICAICT.2014.7035953
Makazhanov A, Yessenbayev Z, Sabyrgaliyev I, Sharafudinov A, Makhambetov O. On certain aspects of Kazakh part-of-speech tagging. In 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc. 2014. 7035953 https://doi.org/10.1109/ICAICT.2014.7035953
Makazhanov, Aibek ; Yessenbayev, Zhandos ; Sabyrgaliyev, Islam ; Sharafudinov, Anuar ; Makhambetov, Olzhas. / On certain aspects of Kazakh part-of-speech tagging. 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014.
@inproceedings{6e54a7b8404e49558e735a73096cf80a,
title = "On certain aspects of Kazakh part-of-speech tagging",
abstract = "We compare and discuss various approaches to the problem of part of speech (POS) tagging of texts written in Kazakh, an agglutinative and highly inflectional Turkic language. In Kazakh a single root may produce hundreds of word forms, and it is difficult, if at all possible, to label enough training data to account for a vast set of all possible word forms in the language. Thus, current state of the art statistical POS taggers may not be as effective for Kazakh as for morphologically less complex languages, e.g. English. Also the choice of a POS tag set may influence the informativeness and the accuracy of tagging.",
author = "Aibek Makazhanov and Zhandos Yessenbayev and Islam Sabyrgaliyev and Anuar Sharafudinov and Olzhas Makhambetov",
year = "2014",
doi = "10.1109/ICAICT.2014.7035953",
language = "English",
booktitle = "8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - On certain aspects of Kazakh part-of-speech tagging

AU - Makazhanov, Aibek

AU - Yessenbayev, Zhandos

AU - Sabyrgaliyev, Islam

AU - Sharafudinov, Anuar

AU - Makhambetov, Olzhas

PY - 2014

Y1 - 2014

N2 - We compare and discuss various approaches to the problem of part of speech (POS) tagging of texts written in Kazakh, an agglutinative and highly inflectional Turkic language. In Kazakh a single root may produce hundreds of word forms, and it is difficult, if at all possible, to label enough training data to account for a vast set of all possible word forms in the language. Thus, current state of the art statistical POS taggers may not be as effective for Kazakh as for morphologically less complex languages, e.g. English. Also the choice of a POS tag set may influence the informativeness and the accuracy of tagging.

AB - We compare and discuss various approaches to the problem of part of speech (POS) tagging of texts written in Kazakh, an agglutinative and highly inflectional Turkic language. In Kazakh a single root may produce hundreds of word forms, and it is difficult, if at all possible, to label enough training data to account for a vast set of all possible word forms in the language. Thus, current state of the art statistical POS taggers may not be as effective for Kazakh as for morphologically less complex languages, e.g. English. Also the choice of a POS tag set may influence the informativeness and the accuracy of tagging.

UR - http://www.scopus.com/inward/record.url?scp=84988269273&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988269273&partnerID=8YFLogxK

U2 - 10.1109/ICAICT.2014.7035953

DO - 10.1109/ICAICT.2014.7035953

M3 - Conference contribution

BT - 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -