摘要: |
尽管基于深度学习的目标检测在交通场景的应用已经取得了一定的进展,但复杂交通场景多目标精度与速度的博弈仍然是个挑战。大多数提升精度的方法都是参数密集型的,大大增加了模型的参数量。针对这一难题,提出了基于YOLOv8的稀疏参数模型,实现在降低参数量的同时提升模型的召回率和检测精度。首先使用简单注意力机制(Simple Attention Mechanism,SimAM)以建立更强劲的骨干网络提取特征;其次提出轻量化的内容感知特征重组模块(Lightweight Content-Aware ReAssembly of Features,L-CARAFE)代替上采样操作,在一个更大的感受野上聚合上下文信息;最后通过稀疏参数的多解耦头,在降低参数量的同时提升模型的检测精度。由于交通场景的复杂性,不仅通过KITTI数据集验证模型的有效性,还通过COCO数据集验证模型的泛化性。该模型在公开的数据集上均能大幅提升召回率和平均精度(mean Average Precision,mAP),其中,nano在KITTI数据集上以2.95 的参数量使召回率和mAP分别提高了3.1%和0.9%,小模型在COCO数据集上的mAP@0.5达到60.6%。 |
关键词: 交通场景 目标检测 参数稀疏化 注意力机制 |
DOI:10.20079/j.issn.1001-893x.240415001 |
|
基金项目:国家自然科学基金资助项目(62176034) |
|
Parameter Sparse Vehicle Detection for Complex Traffic Scene Images |
HAN Xuejuan,QU Zhong |
(1.School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;2.School of Accounting,Xinjiang University of Finance and Economics,Urumqi 830012,China) |
Abstract: |
Although the application of deep learning-based object detection in traffic scenes has made some progress,the game of multi-object accuracy and speed in complex traffic scenes is still a challenge.Most of methods for improving accuracy are parameter-intensive,which greatly increase parameters in the model.To address this challenge,a sparse parameter model based on YOLOv8 is proposed to achieve improved model recall and detection precision while reducing the number of parameters.Firstly,simple attention mechanism(SimAM) is used to build a stronger backbone network to extract features.Secondly,the lightweight content-aware reassembly of features(L-CARAFE) is proposed to replace the up-sampling operation in a larger sensing field to aggregate contextual information.Finally,the detection accuracy of the model is improved while the number of parameters is reduced through the multiple decoupling heads with sparse parameters.Considering the complexity of traffic scenes,not only the validity of the model is verified by the KITTI dataset,but also the generalisation of the model is verified by the COCO dataset.The model can significantly improve the recall and mean average precision(mAP) on both publicly available datasets,among which,nano improves the recall and mAP by 3.1% and 0.9% on the KITTI dataset with a parameter count of 2.95,and the small model achieves an mAP@0.5 of 60.6% on the COCO dataset. |
Key words: traffic scenes object detection parameter sparse attention mechanism |