Abstract
Unsupervised domain adaptation (UDA) is an important research topic in semantic segmentation tasks, wherein pixel-wise annotations are often difficult to collect in a test environment due to their high labeling costs. Previous UDA-based studies trained their segmentation networks using labeled synthetic data and unlabeled realistic data as source and target domains, respectively. However, they often fail to distinguish semantically similar classes, such as person vs. rider and road vs. sidewalk, because these classes are prone to confusion in domain-shifted environments. In this paper, we introduce a Language-Conditioned Masked Segmentation Model (LC-MSM), which is a new framework for the joint learning of context relations and domain-agnostic information for domain-adaptive semantic segmentation. Specifically, we reconstruct semantic labels with masked image conditions on the generalized text embeddings of the corresponding semantic class from OpenCLIP, which contains domain-invariant knowledge from large-scale data. To this end, we correlate the generalized text embeddings onto the per-pixel image feature of a masked image that learned the spatial context to further append domain-agnostic language information to the semantic decoder. This facilitates the generalization of our model to the target domain via the learning of context information within individual training instances, while considering cross-domain representations spanning the entire dataset. LC-MSM achieves an unprecedented UDA performance of 71.8 and 62.8 mIoU on GTA→Cityscapes and SYNTHIA→Cityscapes, respectively, which corresponds to an improvement of +3.5 and +1.9 percent points over the baseline method.
Original language | English |
---|---|
Article number | 110201 |
Journal | Pattern Recognition |
Volume | 148 |
DOIs | |
Publication status | Published - 2024 Apr |
Bibliographical note
Publisher Copyright:© 2023 Elsevier Ltd
Keywords
- Semantic segmentation
- Text-image correlation
- Unsupervised domain adaptation
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence