Probabilistic analysis of word and sentence embeddings with applications to Kazakh language processing

Project: Research project

Grant Program

Collaborative Research Grants Program 2020-2022

Project Description

We propose to develop a probabilistic theory of word vectors that is consistent with the empirical observations from the widely used word embedding models such as word2vec, PMI, etc. Then we will use the theory to investigate properties of word and sentence embeddings. Finally, we will apply the obtained results to train state-of-the-art Kazakh word vectors and use them to improve the existing morphological analyzer for Kazakh.
StatusActive
Effective start/end date1/1/2012/31/22

Fingerprint

Processing

Keywords

  • word embeddings
  • statistical modeling
  • sentence embeddings
  • subword-level modeling
  • Kazakh