Optimizing Group Activity Recognition With Actor Relation Graphs and GCN-LSTM Architectures
Tejonidhi, M. R. and Aravind, C. V. and Aruna Kumar, S. V. and Madhu, C. K. and Vinod, A. M. (2025) Optimizing Group Activity Recognition With Actor Relation Graphs and GCN-LSTM Architectures. IEEE Access, 13. pp. 55957-55969.
|
PDF
Optimizing_Group_Activity_Recognition_With_Actor_Relation_Graphs_and_GCN-LSTM_Architectures.pdf - Published Version Restricted to Registered users only Download (1MB) | Request a copy |
Abstract
The challenge of understanding and recognizing group activities through human behavior and interactions is a prominent issue in the realm of computer vision research. This area boasts a wide range of applications, including security surveillance, healthcare monitoring, and human-computer interaction systems. However, accurately deciphering complex group dynamics continues to pose significant difficul- ties. Traditional video-based activity recognition methods often grapple with persistent obstacles such as environmental noise, background clutter, and the inter-class similarities inherent in activity patterns. Recent advancements in this field have largely focused on spatial feature extraction, which alone proves inadequate for thorough group activity analysis. In response to these limitations, we propose a novel deep learning framework that effectively captures both temporal and spatial characteristics of group interactions. Our architecture employs a Convolutional Neural Network (CNN) with Inception-V3 as the foundational model for initial feature extraction. This is complemented by the development of an Actor Relation Graph (ARG) utilizing Zero Normalized Cross Correlation (ZNCC), which adeptly illustrates both appearance-based and positional relationships among participants. By integrating the ARG with a hybrid model that combines Graph Convolutional Network (GCN), Long Short-Term Memory (LSTM), and Attention mechanisms, our approach significantly enhances the extraction of spatial and relational features compared to conven- tional techniques. Experimental evaluations conducted on two benchmark datasets—the Collective Activity Dataset (CAD) and the Volleyball dataset—demonstrate the efficacy of our framework. Our proposed model achieves state-of-the-art performance, attaining prediction accuracies of 94.32% and 94.47% on the CAD and Volleyball datasets, respectively, thereby surpassing existing methodologies in the realm of group activity recognition.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Group activity recognition, actor relation graph, spatio-temporal feature extraction, GCN-LSTM-Attention |
| Subjects: | 000 Computer science, information and general works > 02 Computer Science |
| Divisions: | Food Engineering |
| Depositing User: | Somashekar K S |
| Date Deposited: | 28 Nov 2025 09:33 |
| Last Modified: | 28 Nov 2025 09:33 |
| URI: | http://ir.cftri.res.in/id/eprint/20130 |
Actions (login required)
![]() |
View Item |

