Project Details
Grant Program
Ministry of Education and Science - Grant Funding 2018-2020
Project Description
Nowadays deep neural networks demonstrate tremendous success in different areas of research, including natural language processing (NLP) and natural language understanding. State of the art models are usually large in size (involving hundreds of millions of trainable parameters) and they often have sophisticated architecture. In this project we raise the following question: Is it possible to develop neural architectures for several natural language processing (NLP) tasks that reach state of the art for variety of languages, including Kazakh, but are smaller in size, stable and structurally simpler than the existing models? We will consider this question in three different axes: architecture-wise, task-wise and language-wise.
It is well known, that modern neural networks require large amounts of supervised/unsupervised data, and they often cannot match the predictive accuracy of classical rule-based methods for low-resourced languages, such as Kazakh. Therefore, we will continue our previous efforts on developing resources for Kazakh, such as corpora (including annotated texts), morphological analyzer and disambiguation tool.
The results of the research will be published in international journals and conference proceedings, and all the resources and tools will be made publicly available as free/open-source software. This will facilitate research on NLP in general and for the Kazakh language in particular by national and international scientific communities. Successful implementation of the Project will result in (a) better neural architectures for language modeling, word embedding and machine translation in general; and (b) state-of-the-art free/open-source systems for Kazakh language modeling, word embeddings and morphological disambiguation.
It is well known, that modern neural networks require large amounts of supervised/unsupervised data, and they often cannot match the predictive accuracy of classical rule-based methods for low-resourced languages, such as Kazakh. Therefore, we will continue our previous efforts on developing resources for Kazakh, such as corpora (including annotated texts), morphological analyzer and disambiguation tool.
The results of the research will be published in international journals and conference proceedings, and all the resources and tools will be made publicly available as free/open-source software. This will facilitate research on NLP in general and for the Kazakh language in particular by national and international scientific communities. Successful implementation of the Project will result in (a) better neural architectures for language modeling, word embedding and machine translation in general; and (b) state-of-the-art free/open-source systems for Kazakh language modeling, word embeddings and morphological disambiguation.
Status | Finished |
---|---|
Effective start/end date | 4/6/18 → 12/31/20 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.