TY - JOUR
T1 - Saraiki language characters dataset (SLCD)
AU - Khan, Muhammad Ahmad
AU - Khan, Khalil
AU - Aloraini, Abdulrahman
AU - Khan, Rehan Ullah
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/6
Y1 - 2024/6
N2 - About 26 million people worldwide use the Saraiki language [1]. In the southern part of Punjab and Sindh, Saraiki language is extensively spoken. One of the most important Saraiki cultural hubs is Dera Ghazi Khan. In Dera Ghazi Khan, the Saraiki language is spoken by over 90 % of the population. Calligraphers use a sophisticated script to write this language. Despite the vast body of Optical Character Recognition (OCR) literature and research dedicated to other languages, a fully functional OCR system is still needed for Saraiki language [2,3]. This work presents a genuine dataset of Saraiki handwritten characters, consisting of 50,000 scanned photos, and makes it accessible to the public for use. All of the photographs include handwritten text contributed by teachers and students from Pak-Austria Fachhochschule for Applied Sciences and Technology, Pakistan. Around 1000 people, roughly half men and half women, contributed in writing this text. For scientific research, the dataset will be made accessible to the general public.
AB - About 26 million people worldwide use the Saraiki language [1]. In the southern part of Punjab and Sindh, Saraiki language is extensively spoken. One of the most important Saraiki cultural hubs is Dera Ghazi Khan. In Dera Ghazi Khan, the Saraiki language is spoken by over 90 % of the population. Calligraphers use a sophisticated script to write this language. Despite the vast body of Optical Character Recognition (OCR) literature and research dedicated to other languages, a fully functional OCR system is still needed for Saraiki language [2,3]. This work presents a genuine dataset of Saraiki handwritten characters, consisting of 50,000 scanned photos, and makes it accessible to the public for use. All of the photographs include handwritten text contributed by teachers and students from Pak-Austria Fachhochschule for Applied Sciences and Technology, Pakistan. Around 1000 people, roughly half men and half women, contributed in writing this text. For scientific research, the dataset will be made accessible to the general public.
KW - Machine learning
KW - Natural language processing
KW - Optical character recognition
KW - Text recognition
UR - http://www.scopus.com/inward/record.url?scp=85192447669&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192447669&partnerID=8YFLogxK
U2 - 10.1016/j.dib.2024.110473
DO - 10.1016/j.dib.2024.110473
M3 - Article
AN - SCOPUS:85192447669
SN - 2352-3409
VL - 54
JO - Data in Brief
JF - Data in Brief
M1 - 110473
ER -