TY - JOUR
T1 - Recognizing isolated words with minimum distance similarity metric padding
AU - Milacic, Mitar
AU - James, Alex Pappachen
AU - Dimitrijev, Sima
N1 - Publisher Copyright:
© 2017-IOS Press and the authors. All rights reserved.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2017
Y1 - 2017
N2 - Automated processing and recognition of human speech commands under unconstrained and noisy recognition situations with a limited number of training samples is a challenging problem of interest to smart devices and systems. In practice, it is impossible to remove noise without losing class discriminative information in the speech signals. Also, any attempts to improve signal quality place an additional burden on the computational capacity in state-of-the-art speech command recognition systems. In this paper, we propose a low-level word processing system using mean-variance normalised frequency-time spectrograms and a new similarity measure that compensates for feature length mismatches such as those resulting from pronunciation variations in speech segments. We find that padding a local similarity matrix with zero similarity values to disregard the effects of a mismatch in length of speech spectrograms results in improved word recognition accuracies and reduction in between class non-discriminative signals. As opposed to the state-of-the-art approaches in spectrogram comparisons such as DTW, the proposed method, when tested using the TIMIT database, shows improved recognition accuracies, robustness to noise, lower computational requirements, and scalability to large word problems.
AB - Automated processing and recognition of human speech commands under unconstrained and noisy recognition situations with a limited number of training samples is a challenging problem of interest to smart devices and systems. In practice, it is impossible to remove noise without losing class discriminative information in the speech signals. Also, any attempts to improve signal quality place an additional burden on the computational capacity in state-of-the-art speech command recognition systems. In this paper, we propose a low-level word processing system using mean-variance normalised frequency-time spectrograms and a new similarity measure that compensates for feature length mismatches such as those resulting from pronunciation variations in speech segments. We find that padding a local similarity matrix with zero similarity values to disregard the effects of a mismatch in length of speech spectrograms results in improved word recognition accuracies and reduction in between class non-discriminative signals. As opposed to the state-of-the-art approaches in spectrogram comparisons such as DTW, the proposed method, when tested using the TIMIT database, shows improved recognition accuracies, robustness to noise, lower computational requirements, and scalability to large word problems.
KW - isolated words
KW - mean-variance filters
KW - metric padding
KW - Similarity measure
KW - speech recognition
KW - word recognition
UR - http://www.scopus.com/inward/record.url?scp=85016803037&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016803037&partnerID=8YFLogxK
U2 - 10.3233/JIFS-169236
DO - 10.3233/JIFS-169236
M3 - Article
AN - SCOPUS:85016803037
VL - 32
SP - 2933
EP - 2939
JO - Journal of Intelligent and Fuzzy Systems
JF - Journal of Intelligent and Fuzzy Systems
SN - 1064-1246
IS - 4
ER -