LC-MSM: Language-Conditioned Masked Segmentation Model for unsupervised domain adaptation

Young Eun Kim, Yu Won Lee, Seong Whan Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Unsupervised domain adaptation (UDA) is an important research topic in semantic segmentation tasks, wherein pixel-wise annotations are often difficult to collect in a test environment due to their high labeling costs. Previous UDA-based studies trained their segmentation networks using labeled synthetic data and unlabeled realistic data as source and target domains, respectively. However, they often fail to distinguish semantically similar classes, such as person vs. rider and road vs. sidewalk, because these classes are prone to confusion in domain-shifted environments. In this paper, we introduce a Language-Conditioned Masked Segmentation Model (LC-MSM), which is a new framework for the joint learning of context relations and domain-agnostic information for domain-adaptive semantic segmentation. Specifically, we reconstruct semantic labels with masked image conditions on the generalized text embeddings of the corresponding semantic class from OpenCLIP, which contains domain-invariant knowledge from large-scale data. To this end, we correlate the generalized text embeddings onto the per-pixel image feature of a masked image that learned the spatial context to further append domain-agnostic language information to the semantic decoder. This facilitates the generalization of our model to the target domain via the learning of context information within individual training instances, while considering cross-domain representations spanning the entire dataset. LC-MSM achieves an unprecedented UDA performance of 71.8 and 62.8 mIoU on GTA→Cityscapes and SYNTHIA→Cityscapes, respectively, which corresponds to an improvement of +3.5 and +1.9 percent points over the baseline method.

Original languageEnglish
Article number110201
JournalPattern Recognition
Volume148
DOIs
Publication statusPublished - 2024 Apr

Bibliographical note

Publisher Copyright:
© 2023 Elsevier Ltd

Keywords

  • Semantic segmentation
  • Text-image correlation
  • Unsupervised domain adaptation

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'LC-MSM: Language-Conditioned Masked Segmentation Model for unsupervised domain adaptation'. Together they form a unique fingerprint.

Cite this