一种基于帧间运动目标框链的视频目标检测算法

李敏; 李凌涵; 白入文; 姜淼; 任俊星; 孟博; 杨阳; 黄子豪; 黄伟庆

引用本文：

李敏,李凌涵,白入文,姜淼,任俊星,孟博,杨阳,黄子豪,黄伟庆.一种基于帧间运动目标框链的视频目标检测算法[J].信息安全学报,已采用 [点击复制]
limin,lilinghan,bairuwen,jiangmiao,renjunxing,mengbo,yangyang,huangzihao,huangweiqing.A Motion-based Seq-bbox Matching Method for Video Object Detection[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 34516次下载 28242次
一种基于帧间运动目标框链的视频目标检测算法
李敏^1,2, 李凌涵^1,2, 白入文^1,2, 姜淼^1,2, 任俊星^1,2, 孟博³, 杨阳^1,2, 黄子豪⁴, 黄伟庆^1,2
0 字体:加大+\|默认\|缩小-
(1.中国科学院信息工程研究所;2.中国科学院大学网络空间安全学院;3.北京理工大学;4.华北电力大学)

摘要:

随着深度神经网络的发展，基于深度学习的目标检测算法已经取得了显著的效果，但在实际使用中，由于视频中目标物体的模糊、遮挡变形等问题，检测效果有待提高。目前主流算法，例如FGFA和Seq-NMS等，无法同时兼顾速度与准确性，不能满足实际应用的需求。如何在保障检测实时性的前提下，提升检测的准确性，是当前视频目标检测算法面临的挑战。本文提出一种基于帧间运动目标框链的后处理方法，以单帧检测算法为基础，从后处理部分入手，引入帧间运动信息对检测结果进行增强。我们利用基于距离的交并比（Distance Intersection over Union ，DIoU）表示帧间运动信息，并提出相邻帧之间的同一个物体应该具有相似的运动信息的观点，完成帧间运动目标框链的构建。最后，引入动态置信度平均方法，来对预测结果进行增强。实验结果表明，以YOLOv5为基础，本算法达到了73.4%的平均精度均值（mean average precision ，mAP），获得了6.2%（67.2%-73.4%）的mAP提升，同时检测速度达到了41每秒处理帧数（frames per second ，fps）。本算法很好的兼顾速度和准确度，并为如何进行更加快速准确的视频目标检测算法提供了思路。

关键词: 视频目标检测检测框运动信息基于距离的交并比

DOI：10.19363/J.cnki.cn10-1380/tn.2023.08.19

投稿时间：2021-03-09修订日期：2021-04-26

基金项目:国家科技攻关计划（No. 2019YFB1005205）

A Motion-based Seq-bbox Matching Method for Video Object Detection

limin^1,2, lilinghan^1,2, bairuwen^1,2, jiangmiao^1,2, renjunxing^1,2, mengbo³, yangyang^1,2, huangzihao⁴, huangweiqing^1,2

(1.Institute of Information Engineering, Chinese Academy of Sciences;2.School of Cyber Security, University of Chinese Academy of Sciences;3.Beijing Institute of Technology;4.North China Electric Power University)

Abstract:

The development of deep neural networks has led deep learning-based object detection algorithms to achieve remarkable results. However, the actual detection effect should be improved because of blurring and occlusion deformation of objects in videos. Current algorithms, such as FGFA and Seq-NMS, are unable to simultaneously combine speed and accuracy and deficient in practical applications. This study aims to propose a practical video object detection algorithm to improve the accuracy of detection and guarantee real-time detection. In particular, the current research proposes a post-processing method called Motion-based Seq-bbox Matching, which is based on a single-frame detection algorithm, and introduces inter-frame motion information to enhance the detection results. We use Distance Intersection over Union (DIoU) to rep-resent the inter-frame motion information and propose the idea that the same object between adjacent frames should have similar motion information, and then combine the dynamic confidence averaging method to jointly complete the enhancement of the prediction results. Experimental results show that based on YOLOv5, the proposed algorithm achieves 73.4% mean average precision (mAP) and obtains a 6.2% mAP (67.2%–73.4%) improvement, while detection speed reaches 41 frames per second (fps). Thus, the proposed algorithm achieves excellent results in terms of balancing speed and accuracy. Lastly, this study provides ideas on developing a fast and accurate video object detection algorithm.

Key words: video object detection bounding box motion information Distance Intersection-over-Union