Abstract
Recent efforts in semantic segmentation using deep learning frameworks have made notable advances. However, capturing the existence of objects in an image at multiple scales still remains a challenge. In this paper, we address the semantic segmentation task based on transformer architecture. Unlike existing methods that capture multi-scale contextual information through infusing every single-scale piece of information from parallel paths, we propose a novel semantic segmentation network incorporating a transformer (TrSeg) to adaptively capture multi-scale information with the dependencies on original contextual information. Given the original contextual information as keys and values, the multi-scale contextual information from the multi-scale pooling module as queries is transformed by the transformer decoder. The experimental results show that TrSeg outperforms the other methods of capturing multi-scale information by large margins.
Original language | English |
---|---|
Pages (from-to) | 29-35 |
Number of pages | 7 |
Journal | Pattern Recognition Letters |
Volume | 148 |
DOIs | |
Publication status | Published - 2021 Aug |
Bibliographical note
Funding Information:This material is based upon work supported by the Air Force Office of Scientific Research under award number FA2386-19-1-4001 .
Publisher Copyright:
© 2021 Elsevier B.V.
Keywords
- Multi-scale contextual information
- Scene understanding
- Semantic segmentation
- Transformer
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence