Document detail
ID

oai:arXiv.org:2404.18895

Topic
Computer Science - Computer Vision...
Author
Liu, Chenyang Chen, Keyan Chen, Bowen Zhang, Haotian Zou, Zhengxia Shi, Zhenwei
Category

Computer Science

Year

2024

listing date

5/29/2024

Keywords
bi-temporal change propose features rscama remote sensing spatial efficient mamba model rsicc
Metrics

Abstract

Remote Sensing Image Change Captioning (RSICC) aims to describe surface changes between multi-temporal remote sensing images in language, including the changed object categories, locations, and dynamics of changing objects (e.g., added or disappeared).

This poses challenges to spatial and temporal modeling of bi-temporal features.

Despite previous methods progressing in the spatial change perception, there are still weaknesses in joint spatial-temporal modeling.

To address this, in this paper, we propose a novel RSCaMa model, which achieves efficient joint spatial-temporal modeling through multiple CaMa layers, enabling iterative refinement of bi-temporal features.

To achieve efficient spatial modeling, we introduce the recently popular Mamba (a state space model) with a global receptive field and linear complexity into the RSICC task and propose the Spatial Difference-aware SSM (SD-SSM), overcoming limitations of previous CNN- and Transformer-based methods in the receptive field and computational complexity.

SD-SSM enhances the model's ability to capture spatial changes sharply.

In terms of efficient temporal modeling, considering the potential correlation between the temporal scanning characteristics of Mamba and the temporality of the RSICC, we propose the Temporal-Traversing SSM (TT-SSM), which scans bi-temporal features in a temporal cross-wise manner, enhancing the model's temporal understanding and information interaction.

Experiments validate the effectiveness of the efficient joint spatial-temporal modeling and demonstrate the outstanding performance of RSCaMa and the potential of the Mamba in the RSICC task.

Additionally, we systematically compare three different language decoders, including Mamba, GPT-style decoder, and Transformer decoder, providing valuable insights for future RSICC research.

The code will be available at \emph{\url{https://github.com/Chen-Yang-Liu/RSCaMa}}

Liu, Chenyang,Chen, Keyan,Chen, Bowen,Zhang, Haotian,Zou, Zhengxia,Shi, Zhenwei, 2024, RSCaMa: Remote Sensing Image Change Captioning with State Space Model

Document

Open

Share

Source

Articles recommended by ES/IODE AI

An Updated Overview of Existing Cancer Databases and Identified Needs
advancements insights assess review lipidomics glycomics proteomics databases research cancer