TY - GEN
T1 - KazNLP
T2 - 22nd International Conference on Speech and Computer, SPECOM 2020
AU - Yessenbayev, Zhandos
AU - Kozhirbayev, Zhanibek
AU - Makazhanov, Aibek
N1 - Funding Information:
Acknowledgments. Supported by Ministry of Education and Science of the Republic of Kazakhstan under the grants No. AP05134272 and No. AP08053085.
Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - We present the current results of our ongoing work on develop-ing tools and algorithms for processing Kazakh language in the framework of KazNLP project. The project is motivated by the need in accessible, easy to use, cross-platform, and well-documented automated text processing tools for Kazakh, particularly user generated text, which includes transliteration, code switching, and other artifacts of language-specific raw data that needs pre-processing. Thus, apart from a basic tokenization-tagging-parsing pipeline, and downstream applications such as named entity recognition and spell checking, KazNLP offers pre-processing tools such as text normalization and language identification. All of the KazNLP tools are released under the Creative Commons license. Since the detailed description of the methods and algorithms that were used in KazNLP are published or to be published in various venues, reference to which is given in the corresponding sections, this work provides just an overview of the tools and their performance level.
AB - We present the current results of our ongoing work on develop-ing tools and algorithms for processing Kazakh language in the framework of KazNLP project. The project is motivated by the need in accessible, easy to use, cross-platform, and well-documented automated text processing tools for Kazakh, particularly user generated text, which includes transliteration, code switching, and other artifacts of language-specific raw data that needs pre-processing. Thus, apart from a basic tokenization-tagging-parsing pipeline, and downstream applications such as named entity recognition and spell checking, KazNLP offers pre-processing tools such as text normalization and language identification. All of the KazNLP tools are released under the Creative Commons license. Since the detailed description of the methods and algorithms that were used in KazNLP are published or to be published in various venues, reference to which is given in the corresponding sections, this work provides just an overview of the tools and their performance level.
KW - Computational linguistics
KW - Corpus linguistics
KW - Kazakh language
KW - Natural language processing
KW - Programming tools
UR - http://www.scopus.com/inward/record.url?scp=85092901401&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092901401&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-60276-5_63
DO - 10.1007/978-3-030-60276-5_63
M3 - Conference contribution
AN - SCOPUS:85092901401
SN - 9783030602758
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 657
EP - 666
BT - Speech and Computer - 22nd International Conference, SPECOM 2020, Proceedings
A2 - Karpov, Alexey
A2 - Potapova, Rodmonga
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 7 October 2020 through 9 October 2020
ER -