TY - GEN
T1 - Reusing weights in subword-Aware neural language models
AU - Assylbekov, Zhenisbek
AU - Takhanov, Rustem
N1 - Funding Information:
We gratefully acknowledge the NVIDIA Corporation for their donation of the Titan X Pascal GPU used for this research. The work of Zhenisbek Assylbekov has been funded by the Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan, contract # 346/018-2018/33-28, IRN AP05133700. The authors would like to thank anonymous reviewers for their valuable feedback, and Dr. J. N. Washington for proofreading an early version of the paper.
Publisher Copyright:
© 2018 The Association for Computational Linguistics.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - We propose several ways of reusing subword embeddings and other weights in subwordaware neural language models. The proposed techniques do not benefit a competitive character-Aware model, but some of them improve the performance of syllable-and morpheme-Aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multilayer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-Aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%?87% fewer parameters.
AB - We propose several ways of reusing subword embeddings and other weights in subwordaware neural language models. The proposed techniques do not benefit a competitive character-Aware model, but some of them improve the performance of syllable-and morpheme-Aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multilayer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-Aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%?87% fewer parameters.
UR - http://www.scopus.com/inward/record.url?scp=85083481621&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083481621&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85083481621
T3 - NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
SP - 1413
EP - 1423
BT - Long Papers
PB - Association for Computational Linguistics (ACL)
T2 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018
Y2 - 1 June 2018 through 6 June 2018
ER -