THREE DIMENSIONAL RECONSTRUCTION OF BOTANICAL TREES WITH SIMULATABLE GEOMETRY Anonymous

Abstract

We tackle the challenging problem of creating full and accurate three dimensional reconstructions of botanical trees with the topological and geometric accuracy required for subsequent physical simulation, e.g. in response to wind forces. Although certain aspects of our approach would benefit from various improvements, our results exceed the state of the art especially in geometric and topological complexity and accuracy. Starting with two dimensional RGB image data acquired from cameras attached to drones, we create point clouds, textured triangle meshes, and a simulatable and skinned cylindrical articulated rigid body model. We discuss the pros and cons of each step of our pipeline, and in order to stimulate future research we make the raw and processed data from every step of the pipeline as well as the final geometric reconstructions publicly available.

1. INTRODUCTION

Human-inhabited outdoor environments typically contain ground surfaces such as grass and roads, transportation vehicles such as cars and bikes, buildings and structures, and humans themselves, but are also typically intentionally populated by a large number of trees and shrubbery; most of the motion in such environments comes from humans, their vehicles, and wind-driven plants/trees. Tree reconstruction and simulation are obviously useful for AR/VR, architectural design and modeling, film special effects, etc. For example, when filming actors running through trees, one would like to create virtual versions of those trees with which a chasing dinosaur could interact. Other uses include studying roots and plants for agriculture (Zheng et al., 2011; Estrada et al., 2015; Fuentes et al., 2017) or assessing the health of trees especially in remote locations (similar in spirit to Zuffi et al. (2018) ). 2.5D data, i.e. 2D images with some depth information, is typically sufficient for robotic navigation, etc.; however, there are many problems that require true 3D scene understanding to the extent one could 3D print objects and have accurate geodesics. Whereas navigating around objects might readily generalize into categories or strategies such as 'move left,' 'move right,' 'step up,' 'go under,' etc., the 3D object understanding required for picking up a cup, knocking down a building, moving a stack of bricks or a pile of dirt, or simulating a tree moving in the wind requires significantly higher fidelity. As opposed to random trial and error, humans often use mental simulations to better complete a task, e.g. consider stacking a card tower, avoiding a falling object, or hitting a baseball (visualization is quite important in sports); thus, physical simulation can play an important role in end-to-end tasks, e. Accurate 3D shape reconstruction is still quite challenging. Recently, Malik arguedfoot_0 that one should not apply general purpose reconstruction algorithms to say a car and a tree and expect both reconstructions to be of high quality. Rather, he said that one should use domain-specific knowledge as he has done for example in Kanazawa et al. (2018) . Another example of this specialization strategy is to rely on the prior that many indoor surfaces are planar in order to reconstruct office spaces (Huang et al., 2017) or entire buildings (Armeni et al., 2016; 2017) . Along the same lines, Zuffi et al. ( 2018) uses a base animal shape as a prior for their reconstructions of wild animals. Thus, we similarly take a specialized approach using a generalized cylinder prior for both large and medium scale features. In Section 3, we discuss our constraints on data collection as well as the logistics behind the choices we made for the hardware (cameras and drones) and software (structure from motion, multi-view stereo, inverse rendering, etc.) used to obtain our raw and processed data. Section 4 discusses our use of machine learning, and Section 5 presents a number of experimental results. In Appendices A, B, and C we describe how we create geometry from the data with enough efficacy for physical simulation.

2. PREVIOUS WORK

Tree Modeling and Reconstruction: Researchers in computer graphics have been interested in modeling trees and plants for decades (Lindenmayer, 1968; Bloomenthal, 1985; Weber & Penn, 1995; Prusinkiewicz et al., 1997; Stava et al., 2014) . SpeedTreefoot_2 is probably the most popular software utilized, and their group has begun to consider the incorporation of data-driven methods. Amongst the data-driven approaches, Tan et al. ( 2007) is most similar to ours combining point cloud and image segmentation data to build coarse-scale details of a tree; however, they generate fine-scale details procedurally using a self-similarity assumption and image-space growth constraints, whereas we aim to capture more accurate finer structures from the image data. Other data-driven approaches include Livny et al. ( 2010) which automatically estimates skeletal structure of trees from point cloud data, Xie et al. ( 2015) which builds tree models by assembling pieces from a database of scanned tree parts, etc. Many of these specialized, data-driven approaches for trees are built upon more general techniques such as the traditional combination of structure from motion (see e.g. Wu ( 2013)) and multi-view stereo (see e.g. Furukawa & Ponce ( 2010)). In the past, researchers studying 3D reconstruction have engineered general approaches to reconstruct fine details of small objects captured by sensors in highly controlled environments (Seitz et al., 2006) . At the other end of the spectrum, researchers have developed approaches for reconstructing building-or even city-scale objects using large amounts of image data available online (Agarwal et al., 2009) . Our goal is to obtain a 3D model of a tree with elements from both of these approaches: the scale of a large structure with the fine details of its many branches and twigs. However, unlike in general reconstruction approaches, we cannot simply collect images online or capture data using a high-end camera. To address similar challenges in specialized cases, researchers take advantage of domain-specific prior knowledge. Zhou et al. (2008) uses a generalized cylinder prior (similar to us) for reconstructing tubular structures observed during medical procedures and illustrates that this approach performs better than simple structure from motion. The process of creating a mesh that faithfully reflects topology and subsequently refining its geometry is similar in spirit to Xu et al. (2018) , which poses a human model first via its skeleton and then by applying fine-scale deformations. Learning and Networks: So far, our use of networks is limited to segmentation tasks, where we rely on segmentation masks for semi-automated tree branch labeling. Due to difficulties in getting sharp details from convolutional networks, the study of network-based segmentation of thin structures is still an active field in itself; there has been recent work on designing specialized multiscale architectures (Ronneberger et al., 2015; Lin et al., 2017; Qu et al., 2018) and also on incorporating perceptual losses (Johnson et al., 2016) during network training (Mosinska et al., 2018) .

3. RAW AND PROCESSED DATA

As a case study, we select a California oak (quercus agrifolia) as our subject for tree reconstruction and simulation (see Figure 1 ). The mere size of this tree imposes a number of restrictions on our data capture: one has to deal with an outdoor, unconstrained environment, wind and branch motion will be an issue, it will be quite difficult to observe higher up portions of the tree especially at close proximities, there will be an immense number of occluded regions because of the large number of branches that one cannot see from any feasible viewpoint, etc. In an outdoor setting, commodity structured light sensors that use infrared light (e.g. the Kinect) fail to produce reliable depth maps as their projected pattern is washed out by sunlight; thus, we opted to use standard RGB cameras. Because we want good coverage of the tree, we cannot simply capture images from the ground; instead, we mounted our cameras on a quadcopter drone that was piloted around the tree. The decision to use a drone introduces additional constraints: the cameras must be



Jitendra Malik, Stanford cs231n guest lecture, May 2018 https://speedtree.com



g. see Kloss et al. (2017); Peng et al. (2017); Jiang & Liu (2018) for examples of combining simulation and learning.

