摘要: |
多标签遥感图像分类旨在预测遥感图像中出现的多个相互关联的对象,其中文本标签能赋予丰富的语义信息。然而,目前多数多标签图像分类法未能充分考虑视觉语义图像-文本对信息。为了解决这一问题,提出了一种基于双文本提示和多重相似性(Bi-text Prompts and Multi-similarity,BTPMS)学习的多标签遥感图像分类算法。该算法首先利用场景与对象标签文本的双文本提示(Bi-text Prompts,BTP)提供丰富的先验知识,再综合考虑场景与对象标签之间的关联,对所得的文本特征和图像特征计算多重相似性,最后利用相似性得分进行多标签遥感图像分类。此外,设计了新颖的局部特征注意力(Local Feature Attention,LFA)模块,从空间与通道维度上捕捉图像中局部结构。在两个基准遥感数据集上进行广泛实验,结果表明所提算法优于对比的多标签图像分类方法。 |
关键词: 遥感图像 多标签图像分类 视觉语言预训练 提示学习 局部特征注意力 |
DOI:10.20079/j.issn.1001-893x.231127005 |
|
基金项目:国家自然科学基金面上项目(62371084) |
|
Bi-text Prompts and Multi-similarity Learning forMulti-label Remote Sensing Image Classification |
BAI Shufen,SONG Tiecheng |
(School of Communications and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China) |
Abstract: |
Multi-label remote sensing image classification aims to predict multiple interrelated objects presenting in remote sensing images,where text labels provide rich semantic information.However,most current multi-label image classification methods fall short of adequately considering the information in visual-semantic image-text pairs.To address this issue,a multi-label remote sensing image classification algorithm based on Bi-text Prompts and Multi-similarity(BTPMS) learning is proposed.This algorithm first leverages Bi-text Prompts(BTP) from scene and object label text to provide rich prior knowledge.Subsequently,considering the correlation between scene and object labels,it calculates multi-similarities between obtained text features and image features.Finally,it utilizes similarity scores for multi-label remote sensing image classification.Additionally,a novel Local Feature Attention(LFA) module is designed to capture local structures in images from both spatial and channel dimensions.Extensive experiments on two benchmark remote sensing datasets demonstrate the superiority of the proposed algorithm over comparative multi-label image classification methods. |
Key words: remote sensing images multi-label image classification visual-and-language pretraining prompt learning local feature attention |