oai:arXiv.org:2210.01069
Computer Science
2022
10/5/2022
Recently, image restoration transformers have achieved comparable performance with previous state-of-the-art CNNs.
However, how to efficiently leverage such architectures remains an open problem.
In this work, we present Dual-former whose critical insight is to combine the powerful global modeling ability of self-attention modules and the local modeling ability of convolutions in an overall architecture.
With convolution-based Local Feature Extraction modules equipped in the encoder and the decoder, we only adopt a novel Hybrid Transformer Block in the latent layer to model the long-distance dependence in spatial dimensions and handle the uneven distribution between channels.
Such a design eliminates the substantial computational complexity in previous image restoration transformers and achieves superior performance on multiple image restoration tasks.
Experiments demonstrate that Dual-former achieves a 1.91dB gain over the state-of-the-art MAXIM method on the Indoor dataset for single image dehazing while consuming only 4.2% GFLOPs as MAXIM.
For single image deraining, it exceeds the SOTA method by 0.1dB PSNR on the average results of five datasets with only 21.5% GFLOPs.
Dual-former also substantially surpasses the latest desnowing method on various datasets, with fewer parameters.
Chen, Sixiang,Ye, Tian,Liu, Yun,Chen, Erkang, 2022, Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration