Towards Large Vocabulary Kazakh-Russian Sign Language Dataset: KRSL-OnlineSchool

Medet Mukushev, Aigerim Kydyrbekova, Vadim Kimmelman, Anara Sandygulova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper presents a new dataset for Kazakh-Russian Sign Language (KRSL) created for the purposes of Sign Language Processing. In 2020, Kazakhstan's schools were quickly switched to online mode due to COVID-19 pandemic. Every working day, the El-arna TV channel was broadcasting video lessons for grades from 1 to 11 with sign language translation. This opportunity allowed us to record a corpus with a large vocabulary and spontaneous SL interpretation. To this end, this corpus contains video recordings of Kazakhstan's online school translated to Kazakh-Russian sign language by 7 interpreters. At the moment we collected and cleaned 890 hours of video material. A custom annotation tool was created to make the process of data annotation simple and easy-to-use by Deaf community. To date, around 325 hours of videos have been annotated with glosses and 4,009 lessons out of 4,547 were transcribed with automatic speech-to-text software. KRSL-OnlineSchool dataset will be made publicly available at https://krslproject.github.io/online-school/.

Original languageEnglish
Title of host publication10th Workshop on the Representation and Processing of Sign Languages
Subtitle of host publicationMultilingual Sign Language Resources, sign-lang 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings
EditorsEleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, Julie A. Hochgesang, Jette Kristoffersen, Johanna Mesch, Marc Schulder
PublisherEuropean Language Resources Association (ELRA)
Pages154-158
Number of pages5
ISBN (Electronic)9791095546863
Publication statusPublished - 2022
Event10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, sign-lang 2022 - Marseille, France
Duration: Jun 20 2022Jun 25 2022

Publication series

Name10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, sign-lang 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings

Conference

Conference10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, sign-lang 2022
Country/TerritoryFrance
CityMarseille
Period6/20/226/25/22

Keywords

  • kazakh-russian sign language
  • sign language dataset
  • sign language processing

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Library and Information Sciences
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Towards Large Vocabulary Kazakh-Russian Sign Language Dataset: KRSL-OnlineSchool'. Together they form a unique fingerprint.

Cite this