Deep Learning for Sequential Models in Natural Language Processing with Applications to Kazakh

  • Assylbekov, Zhenisbek (PI)
  • Takhanov, Rustem (Other Faculty/Researcher)
  • Kashkynbayev, Ardak (Other Faculty/Researcher)
  • Myrzakhmetov, Bagdat (Master student/Bachelor degree holder)
  • Salimzianov, Ilnar (Master student/Bachelor degree holder)
  • Baltabayeva, Assel (PhD student/Master degree holder)
  • Tanekeyev, Gabidin (PhD student/Master degree holder)
  • Bissengaliyeva, Dariya (Master student/Bachelor degree holder)
  • Tuleuov, Zeinulla (PhD student/Master degree holder)
  • Sadvakasov, Samat (PhD student/Master degree holder)
  • Bisazza, Arianna (Other participant)

Project: Government

Project Details

Grant Program

Ministry of Education and Science - Grant Funding 2018-2020

Project Description

Nowadays deep neural networks demonstrate tremendous success in different areas of research, including natural language processing (NLP) and natural language understanding. State of the art models are usually large in size (involving hundreds of millions of trainable parameters) and they often have sophisticated architecture. In this project we raise the following question: Is it possible to develop neural architectures for several natural language processing (NLP) tasks that reach state of the art for variety of languages, including Kazakh, but are smaller in size, stable and structurally simpler than the existing models? We will consider this question in three different axes: architecture-wise, task-wise and language-wise.
It is well known, that modern neural networks require large amounts of supervised/unsupervised data, and they often cannot match the predictive accuracy of classical rule-based methods for low-resourced languages, such as Kazakh. Therefore, we will continue our previous efforts on developing resources for Kazakh, such as corpora (including annotated texts), morphological analyzer and disambiguation tool.
The results of the research will be published in international journals and conference proceedings, and all the resources and tools will be made publicly available as free/open-source software. This will facilitate research on NLP in general and for the Kazakh language in particular by national and international scientific communities. Successful implementation of the Project will result in (a) better neural architectures for language modeling, word embedding and machine translation in general; and (b) state-of-the-art free/open-source systems for Kazakh language modeling, word embeddings and morphological disambiguation.
StatusFinished
Effective start/end date4/6/1812/31/20

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.