Reusing Weights in Subword-aware Neural Language Models

Research output: Contribution to journalArticle

7 Downloads (Pure)

Abstract

We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.
Original languageUndefined/Unknown
JournalProceedings of NAACL-HLT 2018
Publication statusPublished - Feb 23 2018

Keywords

  • cs.CL
  • cs.NE
  • stat.ML
  • 68T50
  • I.2.7

Cite this

@article{4e26d53d292942bbad5fc5b02eb6a069,
title = "Reusing Weights in Subword-aware Neural Language Models",
abstract = "We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20{\%}-87{\%} fewer parameters.",
keywords = "cs.CL, cs.NE, stat.ML, 68T50, I.2.7",
author = "Zhenisbek Assylbekov and Rustem Takhanov",
note = "accepted to NAACL 2018",
year = "2018",
month = "2",
day = "23",
language = "Undefined/Unknown",
journal = "Proceedings of NAACL-HLT 2018",

}

TY - JOUR

T1 - Reusing Weights in Subword-aware Neural Language Models

AU - Assylbekov, Zhenisbek

AU - Takhanov, Rustem

N1 - accepted to NAACL 2018

PY - 2018/2/23

Y1 - 2018/2/23

N2 - We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.

AB - We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.

KW - cs.CL

KW - cs.NE

KW - stat.ML

KW - 68T50

KW - I.2.7

M3 - Article

JO - Proceedings of NAACL-HLT 2018

JF - Proceedings of NAACL-HLT 2018

ER -