One Model to Rule Them all: A Universal Transformer for Biometric Matching

Madina Abdrakhmanova, Assel Yermekova, Yuliya Barko, Vladislav Ryspayev, Medet Jumadildayev, Huseyin Atakan Varol

Research output: Contribution to journalArticlepeer-review

Abstract

This study introduces the first single branch network designed to tackle a spectrum of biometric matching scenarios, including unimodal, multimodal, cross-modal, and missing modality situations. Our method adapts the prototypical network loss to concurrently train on audio, visual, and thermal data within a unified multimodal framework. By converting all three data types into image format, we employ the Vision Transformer (ViT) architecture with shared model parameters, enabling the encoder to transform input modalities into a unified vector space. The multimodal prototypical network loss function ensures that vector representations of the same speaker are proximate regardless of their original modalities. Evaluation on SpeakingFaces and VoxCeleb datasets encompasses a wide range of scenarios, demonstrating the effectiveness of our approach. The trimodal model achieves an Equal Error Rate (EER) of 0.27% on the SpeakingFaces test split, surpassing all previously reported results. Moreover, with a single training, it exhibits comparable performance with unimodal and bimodal counterparts, including unimodal audio, visual, and thermal, as well as audio-visual, audio-thermal, and visual-thermal configurations. In cross-modal evaluation on the VoxCeleb1 test set (audio versus visual), our approach yields an EER of 24.1%, again outperforming state-of-the-art models. This underscores the effectiveness of our unified model in addressing diverse scenarios for biometric verification.

Original languageEnglish
Pages (from-to)96729-96739
Number of pages11
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024

Keywords

  • Biometric matching
  • cross-modal matching
  • face verification
  • face-audio association
  • metric learning
  • multimodal verification
  • speaker verification
  • transformer

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'One Model to Rule Them all: A Universal Transformer for Biometric Matching'. Together they form a unique fingerprint.

Cite this