Collaborative Research Grants Program 2020-2022
We propose to develop a probabilistic theory of word vectors that is consistent with the empirical observations from the widely used word embedding models such as word2vec, PMI, etc. Then we will use the theory to investigate properties of word and sentence embeddings. Finally, we will apply the obtained results to train state-of-the-art Kazakh word vectors and use them to improve the existing morphological analyzer for Kazakh.
|Effective start/end date||1/1/20 → 12/31/22|
- word embeddings
- statistical modeling
- sentence embeddings
- subword-level modeling