Patch and Model Size Characterization for On-Device Efficient-ViTs on Small Datasets Using 12 Quantitative Metrics

Research output: Contribution to journalArticlepeer-review

14 Downloads (Pure)

Abstract

Vision transformers (ViTs) have emerged as a successful alternative to convolutional neural networks (CNNs) in deep learning (DL) applications for computer vision (CV), particularly excelling in accuracy on large-scale datasets within high-performance computing (HPC) or cloud domains. However, in the context of resource-constrained mobile and edge AI devices, there is a lack of systematic and comprehensive investigations into the challenging optimizations for both device-agnostic (e.g., accuracy and model size) and device-related (e.g., latency, memory usage, and power/energy consumption) multi-objectives. To resolve this problem, we first 1) introduce five device-agnostic (DA) and seven device-related (DR) quantitative metrics, 2) using which we thoroughly characterize the effects of ViT hyper-parameters on small datasets in terms of patch size and model size, and then 3) propose a simple yet effective optimization technique called the hierarchical and local (HelLo) tuning method for efficient ViTs. The results show that our method achieves significant improvements of up to 85% in MACs, 67.2% in inference latency, 77.7% in train latency/time, 63.3% in GPU memory, 73.8% in energy consumption, and 263.0% in FoM, with minimal accuracy degradation (up to 2%).

Original languageEnglish
Pages (from-to)25704-25722
Number of pages19
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Funding

This work is partly supported by the Ministry of Science and Higher Education of the Republic of Kazakhstan, AP23487072 (“Leveraging IoT Mesh Networks for Machine Learning Knowledge Transfer.”) and by the Nazarbayev University (NU), under grant 021220FD0851 (FDCRGP).

Keywords

  • characterization
  • Deep learning
  • edge-AI
  • efficient vision transformers (ViTs)
  • embedded systems
  • mobile devices
  • multi-objective optimization
  • on-device ML

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Patch and Model Size Characterization for On-Device Efficient-ViTs on Small Datasets Using 12 Quantitative Metrics'. Together they form a unique fingerprint.

Cite this