TEST-TIME ADAPTATION WITH SLOT-CENTRIC MODELS



In our work, we find evidence that these losses can be insufficient for instance segmentation tasks, without also considering architectural inductive biases. For image segmentation, recent slot-centric generative models break such dependence on supervision by attempting to segment scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised instance segmentation model equipped with a slot-centric image or point-cloud rendering component, that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives. We show that test-time adaptation greatly improves instance segmentation in out-of-distribution scenes. We evaluate Slot-TTA in several 3D and 2D scene instance segmentation benchmarks and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors and self-supervised domain adaptation models.



Figure 1: Image and point-cloud instance segmentation with Slot-TTA. Slot-TTA parse completely novel scenes into familiar entities via slow inference, i.e., gradient descent on the reconstruction error of the scene example under consideration. Left: Slot-TTA outperform Mask2Former (Cheng et al., 2021), a SOTA 2D image segmentor, on segmenting novel images by gradient descent on image synthesis of neighboring image views. Right: Slot-TTA outperform a state-of-the-art 3D-DETR detector by 30% in instance segmentation accuracy in out-of-distribution 3D point clouds, when trained on the same training data.

