TY - GEN
T1 - Structural and semantic modeling of audio for content-based querying and browsing
AU - Sert, Mustafa
AU - Baykal, Buyurman
AU - Yazici, Adnan
PY - 2006
Y1 - 2006
N2 - A typical content-based audio management system deals with three aspects namely audio segmentation and classification, audio analysis, and content-based retrieval of audio. In this paper, we integrate the three aspects of content-based audio management into a single framework and propose an efficient method for flexible querying and browsing of auditory data. More specifically, we utilize two robust feature sets namely MPEG-7 Audio Spectrum Flatness (ASF) and Mel Frequency Cepstral Coefficients (MFCC) as the underlying features in order to improve the content-based retrieval accuracy, since both features have some advantages for distinct types of audio (e.g., music and speech). The proposed system provides a wide range of opportunities to query and browse an audio data by content, such as querying and browsing for a chorus section, sound effects, and query-by-example. In addition, the clients can express their queries in the form of point, ronge, and k-neanst neighbor, which are particularly significant in the multimedia domain.
AB - A typical content-based audio management system deals with three aspects namely audio segmentation and classification, audio analysis, and content-based retrieval of audio. In this paper, we integrate the three aspects of content-based audio management into a single framework and propose an efficient method for flexible querying and browsing of auditory data. More specifically, we utilize two robust feature sets namely MPEG-7 Audio Spectrum Flatness (ASF) and Mel Frequency Cepstral Coefficients (MFCC) as the underlying features in order to improve the content-based retrieval accuracy, since both features have some advantages for distinct types of audio (e.g., music and speech). The proposed system provides a wide range of opportunities to query and browse an audio data by content, such as querying and browsing for a chorus section, sound effects, and query-by-example. In addition, the clients can express their queries in the form of point, ronge, and k-neanst neighbor, which are particularly significant in the multimedia domain.
UR - http://www.scopus.com/inward/record.url?scp=33746255002&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746255002&partnerID=8YFLogxK
U2 - 10.1007/11766254_27
DO - 10.1007/11766254_27
M3 - Conference contribution
AN - SCOPUS:33746255002
SN - 3540346384
SN - 9783540346388
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 319
EP - 330
BT - Flexible Query Answering Systems - 7th International Conference, FQAS 2006, Proceedings
PB - Springer Verlag
T2 - 7th International Conference on Flexible Query Answering Systems, FQAS 2006
Y2 - 7 June 2006 through 10 July 2006
ER -