TrSeg: Transformer for semantic segmentation

  • Youngsaeng Jin
  • , David Han
  • , Hanseok Ko*
  • *Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Recent efforts in semantic segmentation using deep learning frameworks have made notable advances. However, capturing the existence of objects in an image at multiple scales still remains a challenge. In this paper, we address the semantic segmentation task based on transformer architecture. Unlike existing methods that capture multi-scale contextual information through infusing every single-scale piece of information from parallel paths, we propose a novel semantic segmentation network incorporating a transformer (TrSeg) to adaptively capture multi-scale information with the dependencies on original contextual information. Given the original contextual information as keys and values, the multi-scale contextual information from the multi-scale pooling module as queries is transformed by the transformer decoder. The experimental results show that TrSeg outperforms the other methods of capturing multi-scale information by large margins.

    Original languageEnglish
    Pages (from-to)29-35
    Number of pages7
    JournalPattern Recognition Letters
    Volume148
    DOIs
    Publication statusPublished - 2021 Aug

    Bibliographical note

    Funding Information:
    This material is based upon work supported by the Air Force Office of Scientific Research under award number FA2386-19-1-4001 .

    Publisher Copyright:
    © 2021 Elsevier B.V.

    Keywords

    • Multi-scale contextual information
    • Scene understanding
    • Semantic segmentation
    • Transformer

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Computer Vision and Pattern Recognition
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'TrSeg: Transformer for semantic segmentation'. Together they form a unique fingerprint.

    Cite this