Optimizing Group Activity Recognition With Actor Relation Graphs and GCN-LSTM Architectures

Tejonidhi, M. R. and Aravind, C. V. and Aruna Kumar, S. V. and Madhu, C. K. and Vinod, A. M. (2025) Optimizing Group Activity Recognition With Actor Relation Graphs and GCN-LSTM Architectures. IEEE Access, 13. pp. 55957-55969.

PDF
Optimizing_Group_Activity_Recognition_With_Actor_Relation_Graphs_and_GCN-LSTM_Architectures.pdf - Published Version
Restricted to Registered users only
Download (1MB) | Request a copy

Abstract

The challenge of understanding and recognizing group activities through human behavior and
interactions is a prominent issue in the realm of computer vision research. This area boasts a wide range
of applications, including security surveillance, healthcare monitoring, and human-computer interaction
systems. However, accurately deciphering complex group dynamics continues to pose significant difficul-
ties. Traditional video-based activity recognition methods often grapple with persistent obstacles such as
environmental noise, background clutter, and the inter-class similarities inherent in activity patterns. Recent
advancements in this field have largely focused on spatial feature extraction, which alone proves inadequate
for thorough group activity analysis. In response to these limitations, we propose a novel deep learning
framework that effectively captures both temporal and spatial characteristics of group interactions. Our
architecture employs a Convolutional Neural Network (CNN) with Inception-V3 as the foundational model
for initial feature extraction. This is complemented by the development of an Actor Relation Graph (ARG)
utilizing Zero Normalized Cross Correlation (ZNCC), which adeptly illustrates both appearance-based and
positional relationships among participants. By integrating the ARG with a hybrid model that combines
Graph Convolutional Network (GCN), Long Short-Term Memory (LSTM), and Attention mechanisms,
our approach significantly enhances the extraction of spatial and relational features compared to conven-
tional techniques. Experimental evaluations conducted on two benchmark datasets—the Collective Activity
Dataset (CAD) and the Volleyball dataset—demonstrate the efficacy of our framework. Our proposed model
achieves state-of-the-art performance, attaining prediction accuracies of 94.32% and 94.47% on the CAD and
Volleyball datasets, respectively, thereby surpassing existing methodologies in the realm of group activity
recognition.

Item Type:	Article
Uncontrolled Keywords:	Group activity recognition, actor relation graph, spatio-temporal feature extraction, GCN-LSTM-Attention
Subjects:	000 Computer science, information and general works > 02 Computer Science
Divisions:	Food Engineering
Depositing User:	Somashekar K S
Date Deposited:	28 Nov 2025 09:33
Last Modified:	28 Nov 2025 09:33
URI:	http://ir.cftri.res.in/id/eprint/20130

Actions (login required)

: View Item