Combining structural analysis and computer vision techniques for automatic speech summarization

Mustafa Sert, Buyurman Baykal, Adnan Yazici

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Similar to verse and chorus sections that appear as repetitive structures in musical audio, key-concept (or topic) of some speech recordings (e.g., presentations, lectures, etc.) may also repeat itself over the time. Hence, accurate detection of these repetitions may be helpful to the success of automatic speech summarization. Based on this motivation, we consider the applicability of music structural analysis methods to speech summary generation. Our method transforms a 1 - D time-domain speech signal to a 2-D image representation, namely (dis)similarity matrix and detects possible repetitions within the matrix by using proper computer vision techniques. In addition, the method does not transcribe speech signal into words, phrases, or sentences. Hence, it can be generalized as speech-to-speech summarization method, in which summarization results are presented by speech instead of text. Furthermore, the method does not need a prior knowledge about the language or grammar of speech signal. Experiments show that, our method can capture the main theme of speech signals compared to the ideal transcription sections defined by experts and computational analysis shows our proposed method has a good performance.

Original languageEnglish
Title of host publicationProceedings - 10th IEEE International Symposium on Multimedia, ISM 2008
Pages515-520
Number of pages6
DOIs
Publication statusPublished - 2008
Event10th IEEE International Symposium on Multimedia, ISM 2008 - Berkeley, CA, United States
Duration: Dec 15 2008Dec 17 2008

Publication series

NameProceedings - 10th IEEE International Symposium on Multimedia, ISM 2008

Conference

Conference10th IEEE International Symposium on Multimedia, ISM 2008
Country/TerritoryUnited States
CityBerkeley, CA
Period12/15/0812/17/08

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Combining structural analysis and computer vision techniques for automatic speech summarization'. Together they form a unique fingerprint.

Cite this