TY - GEN
T1 - Towards Large Vocabulary Kazakh-Russian Sign Language Dataset
T2 - 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, sign-lang 2022
AU - Mukushev, Medet
AU - Kydyrbekova, Aigerim
AU - Kimmelman, Vadim
AU - Sandygulova, Anara
N1 - Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC 4.0.
PY - 2022
Y1 - 2022
N2 - This paper presents a new dataset for Kazakh-Russian Sign Language (KRSL) created for the purposes of Sign Language Processing. In 2020, Kazakhstan's schools were quickly switched to online mode due to COVID-19 pandemic. Every working day, the El-arna TV channel was broadcasting video lessons for grades from 1 to 11 with sign language translation. This opportunity allowed us to record a corpus with a large vocabulary and spontaneous SL interpretation. To this end, this corpus contains video recordings of Kazakhstan's online school translated to Kazakh-Russian sign language by 7 interpreters. At the moment we collected and cleaned 890 hours of video material. A custom annotation tool was created to make the process of data annotation simple and easy-to-use by Deaf community. To date, around 325 hours of videos have been annotated with glosses and 4,009 lessons out of 4,547 were transcribed with automatic speech-to-text software. KRSL-OnlineSchool dataset will be made publicly available at https://krslproject.github.io/online-school/.
AB - This paper presents a new dataset for Kazakh-Russian Sign Language (KRSL) created for the purposes of Sign Language Processing. In 2020, Kazakhstan's schools were quickly switched to online mode due to COVID-19 pandemic. Every working day, the El-arna TV channel was broadcasting video lessons for grades from 1 to 11 with sign language translation. This opportunity allowed us to record a corpus with a large vocabulary and spontaneous SL interpretation. To this end, this corpus contains video recordings of Kazakhstan's online school translated to Kazakh-Russian sign language by 7 interpreters. At the moment we collected and cleaned 890 hours of video material. A custom annotation tool was created to make the process of data annotation simple and easy-to-use by Deaf community. To date, around 325 hours of videos have been annotated with glosses and 4,009 lessons out of 4,547 were transcribed with automatic speech-to-text software. KRSL-OnlineSchool dataset will be made publicly available at https://krslproject.github.io/online-school/.
KW - kazakh-russian sign language
KW - sign language dataset
KW - sign language processing
UR - http://www.scopus.com/inward/record.url?scp=85146235719&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146235719&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85146235719
T3 - 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, sign-lang 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings
SP - 154
EP - 158
BT - 10th Workshop on the Representation and Processing of Sign Languages
A2 - Efthimiou, Eleni
A2 - Fotinea, Stavroula-Evita
A2 - Hanke, Thomas
A2 - Hochgesang, Julie A.
A2 - Kristoffersen, Jette
A2 - Mesch, Johanna
A2 - Schulder, Marc
PB - European Language Resources Association (ELRA)
Y2 - 20 June 2022 through 25 June 2022
ER -