TY - GEN
T1 - Comparison of Word Embeddings of Unaligned Audio and Text Data Using Persistent Homology
AU - Yessenbayev, Zhandos
AU - Kozhirbayev, Zhanibek
N1 - Funding Information:
Acknowledgements. The work is supported by the Ministry of Education and Science of the Republic of Kazakhstan under the grants No. AP13068635 and No. AP08053085.
Funding Information:
The work is supported by the Ministry of Education and Science of the Republic of Kazakhstan under the grants No. AP13068635 and No. AP08053085. We also thank Dr. Nikolay Makarenko from The Central Astronomical Observatory of the Russian Academy of Sciences at Pulkovo for his invaluable comments and lecture notes on the topic of topological data analysis.
Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - We have performed preliminary work on topological analysis of audio and text data for unsupervised speech processing. The work is based on the assumption that phoneme frequencies and contextual relationships are similar in the acoustic and text domains for the same language. Accordingly, this allowed the creation of a mapping between these spaces that takes into account their geometric structure. As a first step, generative methods based on variational autoencoders were chosen to map audio and text data into two latent vector spaces. In the next stage, persistent homology methods are used to analyze the topological structure of two spaces. Although the results obtained support the idea of the similarity of the two spaces, further research is needed to correctly map acoustic and text spaces, as well as to evaluate the real effect of including topological information in the autoencoder training process.
AB - We have performed preliminary work on topological analysis of audio and text data for unsupervised speech processing. The work is based on the assumption that phoneme frequencies and contextual relationships are similar in the acoustic and text domains for the same language. Accordingly, this allowed the creation of a mapping between these spaces that takes into account their geometric structure. As a first step, generative methods based on variational autoencoders were chosen to map audio and text data into two latent vector spaces. In the next stage, persistent homology methods are used to analyze the topological structure of two spaces. Although the results obtained support the idea of the similarity of the two spaces, further research is needed to correctly map acoustic and text spaces, as well as to evaluate the real effect of including topological information in the autoencoder training process.
KW - Persistent homology and diagram
KW - Topological data analysis
KW - Unsupervised processing
KW - Word embeddings
UR - http://www.scopus.com/inward/record.url?scp=85142749684&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142749684&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-20980-2_59
DO - 10.1007/978-3-031-20980-2_59
M3 - Conference contribution
AN - SCOPUS:85142749684
SN - 9783031209796
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 700
EP - 711
BT - Speech and Computer - 24th International Conference, SPECOM 2022, Proceedings
A2 - Prasanna, S.R. Mahadeva
A2 - Karpov, Alexey
A2 - Samudravijaya, K.
A2 - Agrawal, Shyam S.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Speech and Computer, SPECOM 2022
Y2 - 14 November 2022 through 16 November 2022
ER -