TY - GEN
T1 - EXTENDING MULTILINGUAL ASR TO NEW LANGUAGES USING SUPPLEMENTARY ENCODER AND DECODER COMPONENTS
AU - Khassanov, Yerbolat
AU - Chen, Zhipeng
AU - Chen, Tianfeng
AU - Chong, Tze Yuang
AU - Li, Wei
AU - Lu, Lu
AU - Ma, Zejun
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Extending multilingual automatic speech recognition (mASR) systems to new languages poses challenges, particularly when training data for existing languages is limited or unavailable. To tackle this issue, we suggest utilizing supplementary encoder and decoder components. Specifically, we propose appending and fine-tuning a distinct decoder designed for new languages, while preserving the parameters of existing languages to minimize disruption to their performance. Furthermore, we advocate attaching an additional encoder component to enhance acoustic representation learning for new languages, resulting in substantial improvements in word error rate performance. Our experimental findings demonstrate the effectiveness of the proposed methods for the task of extending language support within mASR systems.
AB - Extending multilingual automatic speech recognition (mASR) systems to new languages poses challenges, particularly when training data for existing languages is limited or unavailable. To tackle this issue, we suggest utilizing supplementary encoder and decoder components. Specifically, we propose appending and fine-tuning a distinct decoder designed for new languages, while preserving the parameters of existing languages to minimize disruption to their performance. Furthermore, we advocate attaching an additional encoder component to enhance acoustic representation learning for new languages, resulting in substantial improvements in word error rate performance. Our experimental findings demonstrate the effectiveness of the proposed methods for the task of extending language support within mASR systems.
KW - ASR
KW - language extension
KW - Multilingual
UR - http://www.scopus.com/inward/record.url?scp=85195376250&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85195376250&partnerID=8YFLogxK
U2 - 10.1109/ICASSP48485.2024.10446800
DO - 10.1109/ICASSP48485.2024.10446800
M3 - Conference contribution
AN - SCOPUS:85195376250
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 10586
EP - 10590
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Y2 - 14 April 2024 through 19 April 2024
ER -