INTEGRATING SYMMETRY INTO DIFFERENTIABLE PLANNING WITH STEERABLE CONVOLUTIONS

Abstract

In this paper, we study a principled approach on incorporating group symmetry into end-to-end differentiable planning algorithms and explore the benefits of symmetry in planning. To achieve this, we draw inspiration from equivariant convolution networks and model the path planning problem as a set of signals over grids. We demonstrate that value iteration can be treated as a linear equivariant operator, which is effectively a steerable convolution. Building upon Value Iteration Networks (VIN), we propose a new Symmetric Planning (SymPlan) framework that incorporates rotation and reflection symmetry using steerable convolution networks. We evaluate our approach on four tasks: 2D navigation, visual navigation, 2 degrees of freedom (2-DOF) configuration space manipulation, and 2-DOF workspace manipulation. Our experimental results show that our symmetric planning algorithms significantly improve training efficiency and generalization performance compared to non-equivariant baselines, including VINs and GPPN.

1. INTRODUCTION

Model-based planning algorithms can struggle to find solutions for complex problems, and one solution is to apply planning in a more structured and reduced space (Sutton and Barto, 2018; Li et al., 2006; Ravindran and Barto, 2004; Fox and Long, 2002) . When a task exhibits symmetry, this structure can be used to effectively reduce the search space for planning. However, existing planning algorithms often assume perfect knowledge of dynamics and require building equivalence classes, which can be inefficient and limit their applicability to specific tasks (Fox and Long, 1999; 2002; Pochter et al., 2011; Zinkevich and Balch, 2001; Narayanamurthy and Ravindran, 2008) . In this paper, we study the path-planning problem and its symmetry structure, as shown in Figure 1 . Given a map M (top row), the objective is to find optimal actions A = SymPlan(M ) (bottom row) to a given position (red dots). If we rotated the map g.M (top right), its solution g.A (shortest path) can also be connected by a rotation with the original solution A. Specifically, we say the task has symmetry since the solutions SymPlan(g.M ) = g.SymPlan(M ) are related by a ⟲ 90 • rotation. As a more concrete example, the action in the NW corner of A is the same as the action in the SW corner of g.A, after also rotating the arrow ⟲ 90 • . This is an example of symmetry appearing in a specific task, which can be observed before solving the task or assuming other domain knowledge. Recently, symmetry in model-free deep reinforcement learning (RL) has also been studied (Mondal et al., 2020; van der Pol et al., 2020a; Wang et al., 2021) . A core benefit of model-free RL that enables great asymptotic performance is its end-to-end differentiability. However, they lack long-horizon planning ability and only effectively handle pixel-level symmetry, such as flipping or rotating image observations and action together. This motivates us to combine the spirit of both: can we enable end-to-end differentiable planning algorithms to make use of symmetry in environments? In this work, we propose a framework called Symmetric Planning (SymPlan) that enables planning with symmetry in an end-to-end differentiable manner while avoiding the explicit construction of equivalence classes for symmetric states. Our framework is motivated by the work in the equivariant network and geometric deep learning community (Bronstein et al., 2021a; Cohen et al., 2020; Kondor and Trivedi, 2018; Cohen and Welling, 2016a; b; Weiler and Cesa, 2021) , which views geometric data as signals (or "steerable feature fields") over a base space. For instance, an RGB image is represented as a signal that maps Z 2 to R 3 . The theory of equivariant networks enables the injection of symmetry into operations between signals through equivariant operations, such as convolutions. Equivariant networks applied to images do not need to explicitly consider "symmetric pixels" while still ensuring symmetry properties, thus avoiding the need to search symmetric states. We apply this intuition to the task of path planning, which is both straightforward and general. Specifically, we focus on the 2D grid and demonstrate that value iteration (VI) for 2D path planning is equivariant under translations, rotations, and reflections (which are isometries of Z 2 ). We further show that VI for path planning is a type of steerable convolution network, as developed in (Cohen and Welling, 2016a). To implement this approach, we use Value Iteration Network (VIN, (Tamar et al., 2016a) ) and its variants, since they require only operations between signals. We equip VIN with steerable convolution to create the equivariant steerable version of VIN, named SymVIN, and we use a variant called GPPN (Lee et al., 2018) to build SymGPPN. Both SymPlan methods significantly improve training efficiency and generalization performance on previously unseen random maps, which highlights the advantage of exploiting symmetry from environments for planning. Our contributions include: • We introduce a framework for incorporating symmetry into path-planning problems on 2D grids, which is directly generalizable to other homogeneous spaces. • We prove that value iteration for path planning can be treated as a steerable CNN, motivating us to implement SymVIN by replacing the 2D convolution with steerable convolution. • We show that both SymVIN and a related method, SymGPPN, offer significant improvements in training efficiency and generalization performance for 2D navigation and manipulation tasks.

2. RELATED WORK

Planning with symmetries. Symmetries are prevalent in various domains and have been used in classical planning algorithms and model checking (Fox and Long, 1999; 2002; Pochter et al., 2011; Shleyfman et al., 2015; Sievers et al., 2015; Sievers; Winterer et al.; Röger et al., 2018; Sievers et al., 2019; Fišer et al., 2019) Barto, 2004; Ferns et al., 2004; Li et al., 2006) . However, they require perfect MDP dynamics and do not scale up well, typically because of the complexity in maintaining abstraction mappings (homomorphisms) and abstracted MDPs. 2022) learn equivariant transition models, but do not consider planning. Additionally, the typical formulation of symmetric MDPs in (Ravindran and Barto, 2004; van der Pol et al., 2020a; Zhao et al., 2022) is slightly different from our formulation here: we consider sym-



Figure 1: Symmetry in path planning.Symmetric Planning approach guarantees the solutions are same up to rotations.

If we can use the rotation (and reflection) symmetry in this task, we effectively reduce the search space by |C 4 | = 4 (or |D 4 | = 8) times. Instead, classic planning algorithms like A* would require searching symmetric states (NP-hard) with known dynamics (Pochter et al., 2011).

van der Pol et al. (2020b)  integrate symmetry into modelfree RL based on MDP homomorphisms (Ravindran andBarto, 2004) and motivate us to consider planning.Park et al. (

. Invariance of the value function for a Markov Decision Process (MDP) with symmetry has been shown byZinkevich and Balch (2001), whileNarayanamurthy and  Ravindran (2008)  proved that finding exact symmetry in MDPs is graph-isomorphism complete. However, classical planning algorithms like A* have a fundamental issue with exploiting symmetries. They construct equivalence classes of symmetric states, which explicitly represent states and introduce symmetry breaking. As a result, they are intractable (NP-hard) in maintaining symmetries in trajectory rollout and forward search (for large state spaces and symmetry groups) and are incompatible with differentiable pipelines for representation learning. This limitation hinders their wider applications in reinforcement learning (RL) and robotics.

