Recognizing isolated words with minimum distance similarity metric padding

Mitar Milacic, Alex Pappachen James, Sima Dimitrijev

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


Automated processing and recognition of human speech commands under unconstrained and noisy recognition situations with a limited number of training samples is a challenging problem of interest to smart devices and systems. In practice, it is impossible to remove noise without losing class discriminative information in the speech signals. Also, any attempts to improve signal quality place an additional burden on the computational capacity in state-of-the-art speech command recognition systems. In this paper, we propose a low-level word processing system using mean-variance normalised frequency-time spectrograms and a new similarity measure that compensates for feature length mismatches such as those resulting from pronunciation variations in speech segments. We find that padding a local similarity matrix with zero similarity values to disregard the effects of a mismatch in length of speech spectrograms results in improved word recognition accuracies and reduction in between class non-discriminative signals. As opposed to the state-of-the-art approaches in spectrogram comparisons such as DTW, the proposed method, when tested using the TIMIT database, shows improved recognition accuracies, robustness to noise, lower computational requirements, and scalability to large word problems.

Original languageEnglish
Pages (from-to)2933-2939
Number of pages7
JournalJournal of Intelligent and Fuzzy Systems
Issue number4
Publication statusPublished - 2017


  • isolated words
  • mean-variance filters
  • metric padding
  • Similarity measure
  • speech recognition
  • word recognition

ASJC Scopus subject areas

  • Statistics and Probability
  • Engineering(all)
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Recognizing isolated words with minimum distance similarity metric padding'. Together they form a unique fingerprint.

Cite this