TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language

Daniil Orel, Askat Kuzdeuov, Rinat Gilmullin, Bulat Khakimov, Huseyin Atakan Varol

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces an open-source dataset for speech synthesis in the Tatar language. The dataset comprises approximately 70 hours of transcribed audio recordings, featuring two professional speakers (one male and one female). Notably, it is the first large-scale dataset of its kind that is publicly available, aimed at promoting Tatar text-to-speech (TTS) applications in both academic and industrial contexts. The paper describes the procedures for developing the dataset, discusses the challenges faced, and outlines important future directions. To demonstrate the reliability of the dataset, baseline end-to-end TTS models were built and evaluated using the subjective mean opinion score (MOS) measure. The dataset, training recipe, and pretrained TTS models are publicly available.

Original languageEnglish
Title of host publication6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages717-721
Number of pages5
ISBN (Electronic)9798350344349
DOIs
Publication statusPublished - 2024
Event6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024 - Osaka, Japan
Duration: Feb 19 2024Feb 22 2024

Publication series

Name6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024

Conference

Conference6th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2024
Country/TerritoryJapan
CityOsaka
Period2/19/242/22/24

Keywords

  • low-resource languages
  • speech synthesis
  • Text- to-speech
  • Turkic languages

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Safety, Risk, Reliability and Quality
  • Health Informatics

Fingerprint

Dive into the research topics of 'TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language'. Together they form a unique fingerprint.

Cite this