Probabilistic analysis of word and sentence embeddings with applications to Kazakh language processing

Project: CRP

Project Details

Grant Program

Collaborative Research Grants Program 2020-2022

Project Description

We propose to develop a probabilistic theory of word vectors that is consistent with the empirical observations from the widely used word embedding models such as word2vec, PMI, etc. Then we will use the theory to investigate properties of word and sentence embeddings. Finally, we will apply the obtained results to train state-of-the-art Kazakh word vectors and use them to improve the existing morphological analyzer for Kazakh.
StatusFinished
Effective start/end date1/1/2012/31/22

Keywords

  • word embeddings
  • statistical modeling
  • sentence embeddings
  • subword-level modeling
  • Kazakh

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
  • CONVERGENCE OF THE PARTITION FUNCTION IN THE STATIC WORD EMBEDDING MODEL

    Mynbaev, K. & Assylbekov, Z., 2022, In: Eurasian Mathematical Journal. 13, 4, p. 70-81 12 p.

    Research output: Contribution to journalArticlepeer-review

  • Speeding Up Entmax

    Tezekbayev, M., Nikoulina, V., Gallé, M. & Assylbekov, Z., 2022, Findings of the Association for Computational Linguistics: NAACL 2022 - Findings. Association for Computational Linguistics (ACL), p. 1142-1158 17 p. (Findings of the Association for Computational Linguistics: NAACL 2022 - Findings).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Geometric Probing of Word Vectors

    Babazhanova, M., Tezekbayev, M. & Assylbekov, Z., 2021, ESANN 2021 Proceedings - 29th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. i6doc.com publication, p. 587-592 6 p. (ESANN 2021 Proceedings - 29th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Open Access
    2 Citations (Scopus)