TY - JOUR
T1 - Enhancing skeleton-based action recognition with hybrid real and gan-generated datasets
AU - Islamgozhayev, Talgat
AU - Beibut Amirgaliyev
AU - Kozhirbayev, PhD, Zhanibek
PY - 2024
Y1 - 2024
N2 - This research addresses the critical challenge of recognizing mutual actions involving multiple individuals, an important task for applications such as video surveillance, human-computer interaction, autonomous systems, and behavioral analysis. Identifying these actions from 3D skeleton motion sequences poses significant challenges due to the necessity of accurately capturing intricate spatial and temporal patterns in diverse, dynamic, and often unpredictable environments. To tackle this, a robust neural network framework was developed that combines Convolutional Neural Networks (CNNs) for efficient spatial feature extraction with Long Short-Term Memory (LSTM) networks to model temporal dependencies over extended sequences. A distinguishing feature of this study is the creation of a hybrid dataset that which combines real-world skeleton motion data with synthetically generated samples, produced using Generative Adversarial Networks (GANs). This dataset enriches variability, enhances generalization, and mitigates data scarcity challenges. Experimental findings across three different network architectures demonstrate that our method significantly enhances recognition accuracy, mainly due to the integration of CNNs and LSTMs alongside the broadened dataset. Our approach successfully identifies complex interactions and ensures consistent performance across different perspectives and environmental conditions. The improved reliability in recognition indicates that this framework can be effectively utilized in practical applications such as security systems, crowd monitoring, and other areas where precise detection of mutual actions is critical, particularly in real-time and dynamic environments
AB - This research addresses the critical challenge of recognizing mutual actions involving multiple individuals, an important task for applications such as video surveillance, human-computer interaction, autonomous systems, and behavioral analysis. Identifying these actions from 3D skeleton motion sequences poses significant challenges due to the necessity of accurately capturing intricate spatial and temporal patterns in diverse, dynamic, and often unpredictable environments. To tackle this, a robust neural network framework was developed that combines Convolutional Neural Networks (CNNs) for efficient spatial feature extraction with Long Short-Term Memory (LSTM) networks to model temporal dependencies over extended sequences. A distinguishing feature of this study is the creation of a hybrid dataset that which combines real-world skeleton motion data with synthetically generated samples, produced using Generative Adversarial Networks (GANs). This dataset enriches variability, enhances generalization, and mitigates data scarcity challenges. Experimental findings across three different network architectures demonstrate that our method significantly enhances recognition accuracy, mainly due to the integration of CNNs and LSTMs alongside the broadened dataset. Our approach successfully identifies complex interactions and ensures consistent performance across different perspectives and environmental conditions. The improved reliability in recognition indicates that this framework can be effectively utilized in practical applications such as security systems, crowd monitoring, and other areas where precise detection of mutual actions is critical, particularly in real-time and dynamic environments
U2 - 10.15587/1729-4061.2024.317092
DO - 10.15587/1729-4061.2024.317092
M3 - Article
SN - 1729-3774
SP - 14
EP - 22
JO - Eastern-European Journal of Enterprise Technologies
JF - Eastern-European Journal of Enterprise Technologies
ER -