quotation:[Copy]
[Copy]
【Print page】 【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 168   Download 92  
基于金字塔增强与跨语义交互的轻量图像目标检测网络
陆蔚
0
(江苏信息职业技术学院 物联网工程学院,江苏 无锡 214153)
摘要:
近年来,轻量化目标检测领域取得了显著进展。然而,现有主流方法缺乏多尺度语义信息的提取,且忽略了深层语义特征与浅层细节特征之间的关系。针对上述缺陷,提出了金字塔池化多尺度增强网络(Pyramid Pooling Enhanced Multi-scale Network,PPMENet,通过设计一个高效金字塔池化模块(Efficient Pyramid Pooling Block,EPPB来提取多尺度深层语义信息,以加强模型的特征表达能力。另一方面,设计了跨语义交互注意力模块(Cross Semantic Level Interaction Attention Module,CSIAM以增强不同语义特征之间的联系。MS COCO 2017测试集的实验结果表明,PPMENet取得了28.0%平均精度,模型大小仅有2.16×106,GFLOPs为0.97,并获得了218 frame/s的推理速度。与其他方法相比,PPMENet在精度和执行效率间取得了较好的平衡。
关键词:  实时图像目标检测  轻量级网络  多尺度特征提取  注意力机制  特征融合
DOI:10.20079/j.issn.1001-893x.240812001
基金项目:
Pyramid-enhancedand Cross-semantic Interaction Network for Lightweight Real-time Image Object Detection
LU Wei
(School of Internet of Things Engineering,Jiangsu Vocational College of Information Technology,Wuxi 214153,China)
Abstract:
Recently,with the development of deep learning,the field of lightweight object detection has witnessed significant progress.However,mainstream lightweight detectors ignore the extraction of multi-scale semantic information.In addition,these approaches ignore the relationship between deep semantic features and shallow detail features.To relieve above shortcomings,a Pyramid Pooling Enhanced Multi-scale Network(PPMENet is proposed and an Efficient Pyramid Pooling Block(EPPB is designed to extract multi-scale deep semantic information,strengthening the feature expression ability of the model.On the other hand,a Cross Semantic Level Interaction Attention Module(CSIAM is designed to enhance information interaction between features at different semantic levels.Experimental results on the MS COCO 2017 test set show that PPMENet gets 28.0% average precision,only with 2.16×106 model size and 0.97 GFLOPs,and achieves inference speed of 218 frame/s.Compared with other methods,PPMENet realizes a good balance between detection accuracy and model execution efficiency.
Key words:  real-time image object detection  lightweight network  multi-scale feature extraction  attention mechanism  feature fusion