SUG: SINGLE-DATASET UNIFIED GENERALIZATION FOR 3D POINT CLOUD CLASSIFICATION

Abstract

In recent years, research on zero-shot domain adaptation, namely Domain Generalization (DG), which aims to adapt a well-trained source domain model to unseen target domains without accessing any target sample, has been fast-growing in the 2D image tasks such as classification and object detection. However, its exploration on 3D point cloud data is still insufficient and challenged by more complex and uncertain cross-domain variances with irregular point data structures and uneven inter-class modality distribution. In this paper, different from previous 2D DG works, we focus on the 3D DG problem, and propose a Single-dataset Unified Generalization (SUG) framework that only leverages the source domain data to alleviate the unforeseen domain differences faced by the well-pretrained source model. Specifically, we first design a Multi-grained Sub-domain Alignment (MSA) method that can constrain the learned representations to be domainagnostic and discriminative, by performing a multi-grained feature alignment process between the splitted sub-domains from the single source dataset. Then, a Sample-level Domain-aware Attention (SDA) strategy is presented, which can selectively enhance easy-to-adapt samples from different sub-domains according to the sample-level inter-domain distance, to avoid the negative transfer. Extensive experiments are conducted on three common 3D point cloud benchmarks. The experimental results demonstrate that SUG framework is effective to boost the model generalization ability for unseen target domains, even outperforming the existing unsupervised domain adaptation methods that have to access extensive target domain data, where we significantly improve classification accuracy by 7.7% on ModelNet-to-ScanNet setting and 2.3% on ShapeNet-to-ScanNet setting. Our code will be available.

1. INTRODUCTION

As a commonly-used data format describing the real world, point clouds-based representations preserve more geometric information residing in 3D scenes, and have become one of the most important data types for 3D scene perception and real applications such as robotics (Rusu et al., 2008; Rusu & Cousins, 2011) , autonomous driving (Sun et al., 2020; Shi et al., 2020) , and augmented and virtual reality (Tredinnick et al., 2016) , giving a better understanding of the surrounding environment for machines. In recent years, point clouds-based vision tasks (Shi et al., 2020) have achieved great progress on the public benchmarks (Vishwanath et al., 2009; Chang et al., 2015; Dai et al., 2017) , which largely owes to the fact that the collected point clouds are carefully annotated, sufficiently large, and low level noised. But in the real world, acquiring such data from a new target domain and manually labeling these extensive 3D data are highly dependent on professionals in this filed, which makes the data acquisition and annotation more difficult, labor-intensive, and time-consuming. One effective solution to transfer the model from fully-labeled source domain to a new domain without extra human labor is Unsupervised Domain Adaptation (UDA) (Shen et al., 2022; Zou et al., 2021; Fan et al., 2022; Yang et al., 2021) , whose purpose is to learn a more generalizable representation between the labeled source domain and unlabeled target domain, such that the model can be adapted to the data distribution of the target domain. For example, when point cloud data distribution from the target domain undergoes serious geometric variances (Shen et al., 2022) , performing a correct source-to-target correspondence can boost the model's adaptability. Besides, GAST (Zou et al., 2021) learns a domain-shared representation for different semantic categories, while a vot-ing reweighting method is designed (Fan et al., 2022) that can assign reliable target domain pseudo labels. However, these techniques are highly dependent on the accessibility of the target domain data, which is a strong assumption and prerequisite for the models running in an unprecedented circumstance, such as autonomous driving system and medical scenarios. Thus, it is meaningful and important to investigate the model's cross-domain generalization ability under the zero-shot target domain constraint, which derivates the task of Domain Generalization (DG) for 3D scenario. However, achieving such zero-shot domain adaptation, i.e., DG, is more challenging in 3D scenario mainly due to the following reasons. (1) Unknown Domain-variance Challenge: 3D point cloud data collected from different sensors or geospatial regions with different data distributions often present serious domain discrepancies. Due to the inaccessibility of the target domain data (or sensor), modeling of source-to-target domain variance is intangible. (2) Uneven Domain Adaptation Challenge: Considering that our goal is to learn a transferable representation that can be generalized to multiple target domains, a robust model needs to perform an even domain adaptation, rather than lean to fit the data distribution on one of the multiple target domains. But for 3D point cloud data with more complex sample-level modality variances, how to ensure an even model adaptation under the zero-shot target domains setting still remains challenging. To tackle the above challenges, we study the typical DG problem in 3D scenario, and introduce a Singe-dataset Unified Generalization (SUG) framework for addressing the 3D point cloud generalization problem. We study a one-to-many domain generalization problem, where the model can be trained on only a single 3D dataset, and is required to be simultaneously generalized to multiple target datasets. Different from previous DG works in 2D scenarios (Shankar et al., 2018; Piratla et al., 2020; Chen et al., 2021) , 3D point cloud data have a more irregular data structure and diverse data distribution within a single dataset, which provides the possibility to exploit the modality and sub-domain changes without accessing any target-domain datasets. To be specific, our SUG framework consists of a Multi-grained Sub-domain Alignment (MSA) method and a Sample-level Domain-aware Attention (SDA) strategy. To address the unknown domain-variance challenge, the MSA method first splits the selected single dataset into different sub-domains. And then, based on the splitted different sub-domains from a single dataset, the baseline model is constrained to simulate as many domain variances as possible from multi-grained features, so that the baseline model can learn multi-grained and multi-domains agnostic representations. To solve the uneven domain adaptation challenge, the SDA is developed, which assumes that the instances from different sub-domains often present different adaptation difficulties. Thus, we add sample-level constraints to the whole sub-domain alignment process according to the dynamically changing sample-level inter-domain distance, leading to an even inter-domain adaptation process. We conduct extensive experiments on several common benchmarks (Qin et al., 2019) under the single-dataset DG scenario, which includes three sub-datasets and our experiments cover the following three scenarios: 1) ModelNet-10→ShapeNet-10/ScanNet-10, meaning that the model is only trained on ModelNet-10 and directly evaluated on both ShapeNet-10 and ScanNet-10; 2) ShapeNet-10→ModelNet-10/ScanNet-10; 3) ScanNet-10→ModelNet-10/ShapeNet-10. Experimental results demonstrate the effectiveness of SUG framework in learning generalizable features of 3D point clouds, and it can also significantly boost the DG ability for many selected baseline models. The main contributions of this paper can be summarized as follows: 1) From a new perspective of one-to-many 3D DG, we explore the possibilities of adapting a model from its original source domain to many unseen domains, and study how to leverage the feature's multi-modal information residing in a single dataset. 2) We propose a SUG to tackle the one-to-many 3D DG problem. The SUG consists of a designed MSA method to learn the domain-agnostic and discriminative features during the source-domain training phase, and a SDA strategy to calculate the sample-level inter-domain distance and balance the adaptation degree of different sub-domains with different inter-domain distances.

2.1. 2D IMAGE-BASED DOMAIN ADAPTATION AND GENERALIZATION

Recent Domain Adaptation (DA) works can be roughly categorized into two types: 1) Adversarial learning-based methods (Ganin & Lempitsky, 2014; Tzeng et al., 2017; Long et al., 2018b; Kang 

