TOWARD LEARNING GEOMETRIC EIGEN-LENGTHS CRUCIAL FOR ROBOTIC FITTING TASKS

Abstract

Some extremely low-dimensional yet crucial geometric eigen-lengths often determine whether an object can be fitted in the environment or not. For example, the height of an object is important to measure to check if it can fit between the shelves of a cabinet, while the width of a couch is crucial when trying to move it through a doorway. Humans have materialized such crucial geometric eigen-lengths in common sense since they are very useful in serving as succinct yet effective, highly interpretable, and universal object representations. However, it remains obscure and underexplored if learning systems can be equipped with similar capabilities of automatically discovering such key geometric quantities in doing robotic fitting tasks. In this work, we therefore for the first time formulate and propose a novel learning problem on this question and set up a benchmark suite including the tasks, the data, and the evaluation metrics for studying the problem. We explore potential solutions and demonstrate the feasibility of learning such eigen-lengths from simply observing successful and failed fitting trials. We also attempt geometric grounding for more accurate eigen-length measurement and study the reusability of the learned geometric eigen-lengths across multiple tasks. Our work marks the first exploratory step toward learning crucial geometric eigen-lengths and we hope it can inspire future research in tackling this important yet underexplored problem.

1. INTRODUCTION

Consider a robot tasked with placing many small objects on warehouse shelves, where both the objects and the shelves have diverse geometric configurations. While the robot can simply try to accomplish the task by trial-and-error, to us as humans, it is clear that certain placements should not be attempted because they will obviously fail. For example, we should not attempt to place a tall object on a shelf whose height is too low. We base this judgement on the estimation of a critical geometric eigen-length or measurement, the height of the object and the shelf, whose comparison allows a quick estimate of task feasibility. While object height is an example of important eigen-lengths of an object that is crucial for the above shelf placement task, it is not hard to think of many other types of object eigen-lengths for other fitting tasks. Figure 1 presents some other example tasks together with the presumable geometric eigen-lengths based on human common sense. For example, the geometric eigen-length diameter is important for the task of stacking plates in different sizes (Figure 1 , (a)), while the width and length of an object are crucial geometric eigen-lengths for deciding if one can put an arbitrary shape object into an open box (Figure 1 , (c)). Having such extremely low-dimensional yet crucial geometric eigen-lengths extracted as the representations for objects is certainly beneficial for designing learning systems for robotic fitting tasks. One telling evidence is that we humans have naturally built up the vocabulary of geometric key quantities, such as height, width, and diameter, when perceiving and modeling everyday objects, and used them to perform various object fitting tasks. Besides being succinct yet effective abstractions of objects for quickly estimating the feasibility for the downstream fitting tasks, such crucial geometric eigen-lengths are also highly interpretable, which exposes the principled reasoning process behind the feasibility checking, and universal, as they are generally applicable to objects with arbitrary shape and useful across different downstream tasks. Current research in representation learning for computer vision and robotics has mostly been focusing on learning high-dimensional latent codes or injecting human knowledge as inductive bias for learning structured representations. While learning high-dimensional latent codes provides total flexibility learning any useful feature for mastering the downstream tasks, these latent codes are highdimensional, hard to interpret, and may be prone to overfitting to the training domain. For structured representations, though researchers have explored using different kinds of object representations, such as bounding boxes (Tulsiani et al., 2017) and key points (Manuelli et al., 2019) , to accomplish various downstream tasks in computer vision and robotics, these structure priors are manually specified based on human knowledge about the tasks. In contrast, we aim to explore the automatic discovery of low-dimensional yet crucial geometric quantities for robotic fitting tasks while injecting the minimal human prior knowledge -only assuming that we are measuring eigen-lengths of the input objects. In this paper, we first propose to study a novel learning problem on discovering low-dimensional geometric eigen-lengths crucial for fitting tasks and set up the benchmark suite for studying the problem. As illustrated in Figure 2 , given a fitting task (putting the bowl inside the drawer of the table) that involves a scene geometry (the table) and an object shape (the bowl), we are interested in predicting whether the object can fit in the scene accomplishing the task or not, via discovering a few crucial geometric eigen-lengths and composing them into a task program which outputs the final task feasibility estimation. To study the problem, we also define a set of commonly seen robotic fitting tasks, generate large-scale data for the training and evaluating on each task, and set up a set of quantitative metrics for evaluating and analyzing the method performance and if the emergent geometric eigen-lengths match the desired ones humans usually use. We also explore potential solutions to the proposed learning problem and present several of our key findings. First of all, we will show that learning such low-dimensional key geometric eigen-lengths are achievable from only using weak supervision signals such as the success or failure of training fitting trials. Secondly, the learned crucial geometric eigen-lengths can be more accurately measured if geometric grounding is allowed and attainable for certain fitting tasks. Finally, we make an initial stab at exploring how to share and re-use the learned geometric eigen-lengths across different tasks and even for novel tasks. Marking the first step defining and exploring this important yet underexplored problem, we hope our work can draw people's attention to this task and inspire future research in designing solutions tackling it. To summarize, this work makes the following contributions: • We propose a novel learning problem on discovering low-dimensional geometric eigen-lengths crucial for fitting tasks; • We set up a benchmark suite for studying the problem, including a set of fitting tasks, the dataset for each task, and a range of quantitative and qualitative metrics for thorough performance evaluation and analysis; • We explore potential solutions to the proposed learning problem and present some key take-away messages summarizing both the successes and unresolved challenges.

2. RELATED WORK

Learning Geometry Abstraction. A long line of research has focused on learning low-dimensional and compact abstraction for input geometry. Given as input a 2D or 3D shape, past works have studied learning various geometric abstraction as the shape representation, such as bounding boxes (Tulsiani et al., 2017; Sun et al., 2019 ), convex shapes (Deng et al., 2020) , Gaussian mixtures (Genova et al.,



Figure 1: Example tasks and the hypothesized crucial geometric measurements by humans.

