Abstract
Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution. To reduce such domain gaps and thus to make 3DOD models more generalizable, we introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which (i) leverages visual semantic cues from an image modality (i.e., camera images) as an effective semantic bridge to close the domain gap in the cross-modal Bird’s Eye View (BEV) representations. Further, (ii) we also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features, which disrupt the discrimination of whether a feature instance comes from a source or an unseen target domain. Overall, our CMDA framework guides the 3DOD model to generate highly informative and domain-adaptive features for novel data distributions. In our extensive experiments with large-scale benchmarks, such as nuScenes, Waymo, and KITTI, those mentioned above provide significant performance gains for UDA tasks, achieving state-of-the-art performance.
Original language | English |
---|---|
Pages (from-to) | 972-980 |
Number of pages | 9 |
Journal | Proceedings of the AAAI Conference on Artificial Intelligence |
Volume | 38 |
Issue number | 2 |
DOIs | |
Publication status | Published - 2024 Mar 25 |
Event | 38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: 2024 Feb 20 → 2024 Feb 27 |
Bibliographical note
Publisher Copyright:© 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
ASJC Scopus subject areas
- Artificial Intelligence