Abstract
Recent prompt-driven zero-shot adaptation methods offer a promising way to handle domain shifts in semantic segmentation by learning with features simulated from natural language prompts. However, these methods typically depend on a fixed set of predefined domain descriptions, which limits their capacity to generalize to previously undefined domains and often necessitates retraining when encountering novel environments. To address this challenge, we propose a Generalized Prompt-driven Zero-shot Domain Adaptive Segmentation framework that enables flexible and robust cross-domain segmentation by learning to map target domain features into the source domain space. This allows inference to be performed through a unified and well-optimized source model, without requiring target data-based or prompt-based retraining when encountering novel conditions. Our framework comprises two key modules: a Low-level Feature Rectification (LLFR) module that aligns visual styles using a historical source-style memory bank, and a High-level Semantic Modulation (HLSM) module that applies language-guided affine transformations to align high-level semantics. Together, these modules enable adaptive multi-level feature adaptation that maps target inputs into the source domain space, thus allowing the model to handle unseen domains effectively at test time. Extensive experiments on multiple zero-shot domain adaptation benchmarks are conducted, and the results show that our method consistently outperforms previous approaches.
| Original language | English |
|---|---|
| Article number | 104615 |
| Journal | Computer Vision and Image Understanding |
| Volume | 263 |
| DOIs | |
| Publication status | Published - 2026 Jan |
Bibliographical note
Publisher Copyright:© 2025 Elsevier Inc.
Keywords
- Open-domain semantic segmentation
- Vision–language models
- Zero-shot domain adaptation
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
Fingerprint
Dive into the research topics of 'Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS