Creation and Annotation of a Handwritten Text Database for the Kazakh Language: Methodologies and Preliminary Results

Arman Yeleussinov, Talgat Islamgozhayev, Zhanibek Kozhirbayev

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes the creation and development of a handwritten text database for the Kazakh language, which aims to overcome the lack of publicly available datasets in this sector. While multiple databases exist for handwritten text recognition in other languages, such as IAM for English and RIMES for French, there is no equivalent resource for Kazakh, which employs the Cyrillic alphabet with additional unique characters. This paper explains the systematic process of creating a handwritten Kazakh text library, from gathering data from over 120 writers to annotating text with technologies such as LabelMe. The collection includes 42 Kazakh alphabet letters and more than 75,000 handwritten characters. By introducing this new dataset, we hope to improve research in optical character recognition (OCR) for the Kazakh language and provide the groundwork for future growth in computer vision and text recognition.

Original languageEnglish
Title of host publication5th International Conference on Electrical, Communication and Computer Engineering, ICECCE 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331529437
DOIs
Publication statusPublished - 2024
Event5th International Conference on Electrical, Communication and Computer Engineering, ICECCE 2024 - Kuala Lumpur, Malaysia
Duration: Oct 30 2024Oct 31 2024

Publication series

Name5th International Conference on Electrical, Communication and Computer Engineering, ICECCE 2024

Conference

Conference5th International Conference on Electrical, Communication and Computer Engineering, ICECCE 2024
Country/TerritoryMalaysia
CityKuala Lumpur
Period10/30/2410/31/24

Keywords

  • database of text
  • handwritten text recognition
  • Kazakh language
  • optical character recognition

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Electrical and Electronic Engineering
  • Safety, Risk, Reliability and Quality
  • Control and Optimization
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Creation and Annotation of a Handwritten Text Database for the Kazakh Language: Methodologies and Preliminary Results'. Together they form a unique fingerprint.

Cite this