摘要: |
基于视觉的车辆跟踪技术易受到背景复杂、低分辨率、光照变化等干扰而发生跟踪漂移的现象。为提高在复杂场景中的跟踪性能,提出一种基于注意力机制的视觉车辆跟踪算法。首先,利用基于注意力机制的Swin Transformer充分挖掘与表达特征,完成对全局信息建模;其次,使用基于注意力机制的编码器将信息融合与增强,释放出注意力机制强大的潜力;最后,利用简单、堆叠的RepVGG结构网络对目标位置进行预测。实验结果表明,所提算法在两个公开、大型基准数据集LaSOT与UAV123上精确度分别达到78.4%和89.6%,成功率分别达到69.3%和69.8%,性能超越其他主流跟踪器;对OTB100数据集中的车辆视频序列跟踪结果进行可视化与分析,效果优于基准STARK-S50,具有更稳定的跟踪性能,能够对抗背景复杂、模糊、相似物体、遮挡、光线昏暗、车辆尺度变换与旋转等多种跟踪挑战。 |
关键词: 复杂场景 车辆跟踪 注意力机制 Swin Transformer |
DOI:10.20079/j.issn.1001-893x.230821002 |
|
基金项目:国家重点研发计划(2021YFC3320300);辽宁省教育厅项目(LJ212413631005) |
|
A Vehicle Tracking Method Based on Attention Mechanism in Complex Scenarios |
ZHU Hong,LI Yingqiu,ZHOU Hui,SHEN Yujun |
(1.College of Computer and Software,Dalian Neusoft Information University,Dalian 116023,China;2.Hangzhou Hikvision Digital Technology Co.,Ltd.,Hangzhou 310052,China) |
Abstract: |
Visual-based vehicle tracking techniques often encounter tracking drift issues due to complex backgrounds,low-resolution images,and variations in lighting conditions.To address these challenges and enhance tracking performance in complex scenarios,a novel visual vehicle tracking algorithm based on attention mechanisms is proposed.The proposed algorithm leverages the attention-based Swin Transformer to effectively explore and represent features,enabling comprehensive modeling of global information.Furthermore,an attention-based encoder is employed to fuse and augment the gathered information,harnessing the full potential of attention mechanisms.Finally,a simple yet stacked RepVGG network is employed to accurately predict the position of the tracked vehicle.Experimental evaluations conducted on two public large-scale benchmark datasets,LaSOT and UAV123.It achieves precision of 78.4% and 89.6%,as well as success rates of 69.3% and 69.8% on the respective datasets.Moreover,the proposed algorithm outperforms existing tracking methods,showcasing its effectiveness.Additionally,extensive visualizations and analysis are conducted on vehicle video sequences from the OTB100 dataset.The results are better than the benchmark STARK-S50.It has more stable tracking performance and can resist various tracking challenges such as complex backgrounds,blurring,similar objects,occlusion,dim lighting,vehicle scale transformation and rotation. |
Key words: complex scenarios vehicle tracking attention mechanism Swin Transformer |