Transformer architecture has attained noteworthy performance achievements in recent image super-resolution research. However, current transformer-based methods still expose limitations in fully harnessing domain- specific information within images, particularly when applied to broader-scale remote sensing images that contain diverse landscape objects on one scene. Remote sensing images have relatively lower resolution compared to the common super-resolution training dataset and each landscape object covers a small area on the image. These natures of remote sensing images significantly reduced the attention pixels for image restoration in existing transformer-based methods. To address this challenge and enhance domain-specific multi-object image reconstruction, we introduce FocalSR, a Transformer model featuring FOurier-transform Cross Attention Layers for Super-Resolution. Drawing inspiration from state-of-the-art Transformer models like Hybrid Attention Transformer (HAT), FocalSR incorporates channel-focused and window-centric self-attention mechanisms. By integrating Fast Fourier Convolution into the cross-attention layer, FocalSR extends its capacity to capture image- wide information and intricate details in low-resolution images. Through unified task pretraining during model development, we validate the efficacy of these enhancements through extensive testing, resulting in substantial performance improvements. Notably, our experiments showcase FocalSR’s superior performance in remote sensing datasets, demonstrating a notable 1 dB enhancement in the PSNR metric compared to other state-of-the- art methods. Additionally, significant improvements are observed in challenging scenarios such as pattern restoration and vegetation detail preservation, underscoring the transformative potential of FocalSR in advancing image processing and domain-specific vision tasks.