Visual and auditory data fusion for energy-efficient and improved object recognition in wireless multimedia sensor networks

Murat Koyuncu, Adnan Yazici, Muhsin Civelek, Ahmet Cosar, Mustafa Sert

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)


Automatic threat classification without human intervention is a popular research topic in wireless multimedia sensor networks (WMSNs) especially within the context of surveillance applications. This paper explores the effect of fusing audio-visual multimedia and scalar data collected by the sensor nodes in a WMSN for the purpose of energy-efficient and accurate object detection and classification. In order to do that, we implemented a wireless multimedia sensor node with video and audio capturing and processing capabilities in addition to traditional/ordinary scalar sensors. The multimedia sensors are kept in sleep mode in order to save energy until they are activated by the scalar sensors which are always active. The object recognition results obtained from video and audio applications are fused to increase the object recognition performance of the sensor node. Final results are forwarded to the sink in text format, and this greatly reduces the size of data transmitted in network. Performance test results of the implemented prototype system show that the fusing audio data with visual data improves automatic object recognition capability of a sensor node significantly. Since auditory data requires less processing power compared to visual data, the overhead of processing the auditory data is not high, and it helps to extend network lifetime of WMSNs.

Original languageEnglish
Article number8565958
Pages (from-to)1839-1849
Number of pages11
JournalIEEE Sensors Journal
Issue number5
Publication statusPublished - Mar 1 2019


  • object detection
  • visual and auditory data fusion
  • Wireless multimedia sensor
  • WMSN

ASJC Scopus subject areas

  • Instrumentation
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Visual and auditory data fusion for energy-efficient and improved object recognition in wireless multimedia sensor networks'. Together they form a unique fingerprint.

Cite this