Benoit, PaulBresson, MarcXing, YangGuo, WeisiTsourdos, Antonios2024-03-222024-03-222024-03-01Benoit P, Bresson M, Xing Y, et al., (2024) Real-time vision-based violent actions detection through CCTV cameras with pose estimation. In: 2023 IEEE Smart World Congress (SWC), 28-31 August 2023, Portsmouth, UK, pp. 844-849979-8-3503-1980-4https://doi.org/10.1109/SWC57546.2023.10448959https://dspace.lib.cranfield.ac.uk/handle/1826/21077In large structures under video surveillance, or when a place is crowded, a CCTV operator cannot monitor hundreds of people on different video streams. This paper presents a proof of concept for a real-time vision-based system for detecting violent actions through CCTV cameras with pose estimation. The proposed system uses a combination of computer vision techniques, including pose estimation, object tracking, and a deep learning algorithm based on time-series features, to accurately identify violent actions in real-time. Our features are based on a fixed-size rolling window that computes the position of each person’s limb along with their velocity. The proposed pipeline achieves a high accuracy rate of 92% with an overall latency of around 0.07 seconds per frame using a RTX 3060 Mobile GPU, making it a powerful tool for enhancing public safety and security. This system can be deployed in a wide range of scenarios, including public places, transportation hubs, and other critical infrastructure, to provide real-time alerts and facilitate rapid response in case of violent incidentsen-UKAttribution-NonCommercial 4.0 Internationalhttp://creativecommons.org/licenses/by-nc/4.0/Violence detectionpose estimationnear-real timeYOLOv8deep SORTReal-time vision-based violent actions detection through CCTV cameras with pose estimationConference paper979-8-3503-1980-4