Weakly supervised violence detection in surveillance vídeo.
Nenhuma Miniatura disponível
Data
2022
Título da Revista
ISSN da Revista
Título de Volume
Editor
Resumo
Automatic violence detection in video surveillance is essential for social and personal
security. Monitoring the large number of surveillance cameras used in public and private areas is
challenging for human operators. The manual nature of this task significantly increases the possibility
of ignoring important events due to human limitations when paying attention to multiple targets
at a time. Researchers have proposed several methods to detect violent events automatically to
overcome this problem. So far, most previous studies have focused only on classifying short clips
without performing spatial localization. In this work, we tackle this problem by proposing a weakly
supervised method to detect spatially and temporarily violent actions in surveillance videos using
only video-level labels. The proposed method follows a Fast-RCNN style architecture, that has
been temporally extended. First, we generate spatiotemporal proposals (action tubes) leveraging
pre-trained person detectors, motion appearance (dynamic images), and tracking algorithms. Then,
given an input video and the action proposals, we extract spatiotemporal features using deep neural
networks. Finally, a classifier based on multiple-instance learning is trained to label each action tube
as violent or non-violent. We obtain similar results to the state of the art in three public databases
Hockey Fight, RLVSD, and RWF-2000, achieving an accuracy of 97.3%, 92.88%, 88.7%, respectively.
Descrição
Palavras-chave
Spatiotemporal violence detection, Dynamic image
Citação
CHOQUELUQUE ROMAN, D.; CÁMARA CHÁVEZ, G. Weakly supervised violence detection in surveillance vídeo. Sensors, v. 22, n. 12, artigo 4502, 2022. Disponível em: <https://www.mdpi.com/1424-8220/22/12/4502>. Acesso em: 06 jul. 2023.