TY - GEN
T1 - Audio feature and classifier analysis for efficient recognition of environmental sounds
AU - Okuyucu, Cigdem
AU - Sert, Mustafa
AU - Yazici, Adnan
PY - 2013
Y1 - 2013
N2 - Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.
AB - Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.
KW - Environmental sound classification
KW - HMM
KW - MFCC
KW - MPEG-7
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=84900675286&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84900675286&partnerID=8YFLogxK
U2 - 10.1109/ISM.2013.29
DO - 10.1109/ISM.2013.29
M3 - Conference contribution
AN - SCOPUS:84900675286
SN - 9780769551401
T3 - Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013
SP - 125
EP - 132
BT - Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013
T2 - 15th IEEE International Symposium on Multimedia, ISM 2013
Y2 - 9 December 2013 through 11 December 2013
ER -