SUG: SINGLE-DATASET UNIFIED GENERALIZATION FOR 3D POINT CLOUD CLASSIFICATION

Abstract

In recent years, research on zero-shot domain adaptation, namely Domain Generalization (DG), which aims to adapt a well-trained source domain model to unseen target domains without accessing any target sample, has been fast-growing in the 2D image tasks such as classification and object detection. However, its exploration on 3D point cloud data is still insufficient and challenged by more complex and uncertain cross-domain variances with irregular point data structures and uneven inter-class modality distribution. In this paper, different from previous 2D DG works, we focus on the 3D DG problem, and propose a Single-dataset Unified Generalization (SUG) framework that only leverages the source domain data to alleviate the unforeseen domain differences faced by the well-pretrained source model. Specifically, we first design a Multi-grained Sub-domain Alignment (MSA) method that can constrain the learned representations to be domainagnostic and discriminative, by performing a multi-grained feature alignment process between the splitted sub-domains from the single source dataset. Then, a Sample-level Domain-aware Attention (SDA) strategy is presented, which can selectively enhance easy-to-adapt samples from different sub-domains according to the sample-level inter-domain distance, to avoid the negative transfer. Extensive experiments are conducted on three common 3D point cloud benchmarks. The experimental results demonstrate that SUG framework is effective to boost the model generalization ability for unseen target domains, even outperforming the existing unsupervised domain adaptation methods that have to access extensive target domain data, where we significantly improve classification accuracy by 7.7% on ModelNet-to-ScanNet setting and 2.3% on ShapeNet-to-ScanNet setting. Our code will be available.

1. INTRODUCTION

As a commonly-used data format describing the real world, point clouds-based representations preserve more geometric information residing in 3D scenes, and have become one of the most important data types for 3D scene perception and real applications such as robotics (Rusu et al., 2008; Rusu & Cousins, 2011 ), autonomous driving (Sun et al., 2020; Shi et al., 2020) , and augmented and virtual reality (Tredinnick et al., 2016) , giving a better understanding of the surrounding environment for machines. In recent years, point clouds-based vision tasks (Shi et al., 2020) have achieved great progress on the public benchmarks (Vishwanath et al., 2009; Chang et al., 2015; Dai et al., 2017) , which largely owes to the fact that the collected point clouds are carefully annotated, sufficiently large, and low level noised. But in the real world, acquiring such data from a new target domain and manually labeling these extensive 3D data are highly dependent on professionals in this filed, which makes the data acquisition and annotation more difficult, labor-intensive, and time-consuming. One effective solution to transfer the model from fully-labeled source domain to a new domain without extra human labor is Unsupervised Domain Adaptation (UDA) (Shen et al., 2022; Zou et al., 2021; Fan et al., 2022; Yang et al., 2021) , whose purpose is to learn a more generalizable representation between the labeled source domain and unlabeled target domain, such that the model can be adapted to the data distribution of the target domain. For example, when point cloud data distribution from the target domain undergoes serious geometric variances (Shen et al., 2022) , performing a correct source-to-target correspondence can boost the model's adaptability. Besides, GAST (Zou et al., 2021) learns a domain-shared representation for different semantic categories, while a vot-

