TY - GEN
T1 - Creation and Annotation of a Handwritten Text Database for the Kazakh Language
T2 - 5th International Conference on Electrical, Communication and Computer Engineering, ICECCE 2024
AU - Yeleussinov, Arman
AU - Islamgozhayev, Talgat
AU - Kozhirbayev, Zhanibek
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This paper describes the creation and development of a handwritten text database for the Kazakh language, which aims to overcome the lack of publicly available datasets in this sector. While multiple databases exist for handwritten text recognition in other languages, such as IAM for English and RIMES for French, there is no equivalent resource for Kazakh, which employs the Cyrillic alphabet with additional unique characters. This paper explains the systematic process of creating a handwritten Kazakh text library, from gathering data from over 120 writers to annotating text with technologies such as LabelMe. The collection includes 42 Kazakh alphabet letters and more than 75,000 handwritten characters. By introducing this new dataset, we hope to improve research in optical character recognition (OCR) for the Kazakh language and provide the groundwork for future growth in computer vision and text recognition.
AB - This paper describes the creation and development of a handwritten text database for the Kazakh language, which aims to overcome the lack of publicly available datasets in this sector. While multiple databases exist for handwritten text recognition in other languages, such as IAM for English and RIMES for French, there is no equivalent resource for Kazakh, which employs the Cyrillic alphabet with additional unique characters. This paper explains the systematic process of creating a handwritten Kazakh text library, from gathering data from over 120 writers to annotating text with technologies such as LabelMe. The collection includes 42 Kazakh alphabet letters and more than 75,000 handwritten characters. By introducing this new dataset, we hope to improve research in optical character recognition (OCR) for the Kazakh language and provide the groundwork for future growth in computer vision and text recognition.
KW - database of text
KW - handwritten text recognition
KW - Kazakh language
KW - optical character recognition
UR - http://www.scopus.com/inward/record.url?scp=85217245116&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85217245116&partnerID=8YFLogxK
U2 - 10.1109/ICECCE63537.2024.10823470
DO - 10.1109/ICECCE63537.2024.10823470
M3 - Conference contribution
AN - SCOPUS:85217245116
T3 - 5th International Conference on Electrical, Communication and Computer Engineering, ICECCE 2024
BT - 5th International Conference on Electrical, Communication and Computer Engineering, ICECCE 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 October 2024 through 31 October 2024
ER -