摘要: |
针对光学成像技术受到光线衰减、散射等因素影响,图像质量和目标分辨率较差,不利于开展水下目标检测任务的问题,提出了一种高效水下目标检测框架IEMAyoloViT。该框架结合了改进Vision Transformer(ViT)主干的YOLOv8算法YOLOViT和一种融合多尺度高效空间注意力机制(Efficient Multi-scale Attention,EMA)的C2f模块,以解决目标特征提取过程中注意力分散的问题。同时改进了Inner-CIoU损失函数,基于不同尺度的辅助边界加速边界框回归。实验结果表明,在全国水下机器人大赛(Underwater Robot Professional Contest,URPC) 2021数据集中,IEMAyoloViT的mAP50高达83.2%,较YOLOv8高9.2%;mAP50:95较YOLOv8高1.0%,证明了IEMAyoloViT的有效性和应用潜力。 |
关键词: 水下目标检测 深度学习 视觉自注意力模型 注意力机制 |
DOI:10.20079/j.issn.1001-893x.231206005 |
|
基金项目:国家重点研发计划(2019YFB1803500);四川省科技计划(2020YJ0016) |
|
IEMAyoloViT:an Underwater Target Detection Algorithm Based on Improved YOLOv8 |
SHI Kequan,LI Qi,SUI Hao,ZHU Hongna |
(School of Physical Science and Technology,Southwest Jiaotong University,Chengdu 610031,China) |
Abstract: |
The effectiveness is hampered by challenges arising from optical imaging techniques,which are adversely affected by light attenuation and scattering.These factors collectively contribute to a reduction in image quality and target resolution,thereby introducing impediments to the underwater target detection tasks.To address these challenges,an efficient underwater target detection model,denoted as IEMAyoloViT,is introduced.The proposed model incorporates an enhanced YOLOv8 algorithm,referred to as YOLOViT,which is based on refinements made to the Vision Transformer(ViT) backbone.Additionally,a C2f module that integrates Efficient Multi-scale Attention(EMA) is also incorporated.This architectural augmentation addresses concerns related to attention dispersion during the extraction of target features.Furthermore,the model leverages an improved Inner-CIoU loss function and incorporates auxiliary boundaries at various scales to expedite the process of bounding box regression.The results show that,on the Underwater Robot Professional Contest(URPC) 2021 dataset,the devised IEMAyoloViT attains an mAP50 of 83.2%,exhibiting a notable improvement of 9.2% over YOLOv8.Furthermore,the mAP50:95 metric also surpasses YOLOv8 by 1.0%,which proves the effectiveness and application potential of IEMAyoloViT in underwater target detection. |
Key words: underwater target detection deep learning Visual Transformer attention mechanism |