TrSeg: Transformer for semantic segmentation

Youngsaeng Jin, David Han, Hanseok Ko

Research output: Contribution to journalArticlepeer-review

62 Citations (Scopus)

Abstract

Recent efforts in semantic segmentation using deep learning frameworks have made notable advances. However, capturing the existence of objects in an image at multiple scales still remains a challenge. In this paper, we address the semantic segmentation task based on transformer architecture. Unlike existing methods that capture multi-scale contextual information through infusing every single-scale piece of information from parallel paths, we propose a novel semantic segmentation network incorporating a transformer (TrSeg) to adaptively capture multi-scale information with the dependencies on original contextual information. Given the original contextual information as keys and values, the multi-scale contextual information from the multi-scale pooling module as queries is transformed by the transformer decoder. The experimental results show that TrSeg outperforms the other methods of capturing multi-scale information by large margins.

Original languageEnglish
Pages (from-to)29-35
Number of pages7
JournalPattern Recognition Letters
Volume148
DOIs
Publication statusPublished - 2021 Aug

Bibliographical note

Funding Information:
This material is based upon work supported by the Air Force Office of Scientific Research under award number FA2386-19-1-4001 .

Publisher Copyright:
© 2021 Elsevier B.V.

Keywords

  • Multi-scale contextual information
  • Scene understanding
  • Semantic segmentation
  • Transformer

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'TrSeg: Transformer for semantic segmentation'. Together they form a unique fingerprint.

Cite this