Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search FAST AND PRECISE: ADJUSTING PLANNING HORIZON WITH ADAPTIVE SUBGOAL SEARCH

Abstract

Complex reasoning problems contain states that vary in the computational cost required to determine the right action plan. To take advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. To this end, AdaSubS generates diverse sets of subgoals at different distances. A verification mechanism is employed to filter out unreachable subgoals swiftly, making it possible to focus on feasible further subgoals. In this way, AdaSubS benefits from the efficiency of planning with longerterm subgoals and the fine control with shorter-term ones, and thus scales well to difficult planning problems. We show that AdaSubS significantly surpasses hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik's Cube, and the inequality-proving benchmark INT.

1. INTRODUCTION

When solving hard problems, people often try to decompose them into smaller parts that are typically easier to complete (Hollerman et al., 2000) . Similarly, subgoal search methods aim to solve complex tasks by considering intermediate subgoals leading towards the main goal. Besides their intuitive appeal, such approaches offer many practical advantages. Most notably, they enable deeper search within a smaller computational budget and reduce the negative impact of approximation errors. Subgoal search methods powered by deep learning have shown promising results for continuous control tasks, such as robotic arm manipulation (Nair & Finn, 2020; Jayaraman et al., 2019; Fang et al., 2019) and navigation (Kim et al., 2019; Savinov et al., 2018 ). Recently, Czechowski et al. (2021) showed that the usage of a subgoal generator can significantly improve search efficiency on discrete domains with high combinatorial complexity. This paper uses Czechowski et al. (2021) as a starting point and pushes forward, building upon the following observation: many complex reasoning problems contain states that vary in complexity, measured by the computational cost required to determine the right action plan. To illustrate this, imagine driving a car. When traversing a narrow, winding street, it is crucial to focus on the closest events: the next turn, the next car to avoid, etc. However, after entering a straight, empty street, it is enough to think about reaching its far end. This suggests that careful balancing of the subgoal distance is desirable: this involves selecting longer-term subgoals, if possible, to advance faster towards the goal, and choosing shorter-term subgoals to power through the harder states. Hence, the question arises whether it is possible and, if so, how to incorporate this adaptive subgoal generation procedure into subgoal search methods. In this paper, we answer this question affirmatively. An illustrative example of adaptive planning. The planner may choose long-distance subgoals in the easier areas (e.g. the left most part) and use short distances in the hard areas (e.g. middle part). We propose a novel planning algorithm Adaptive Subgoal Search (AdaSubS), which adaptively chooses from subgoals with different horizons. Our method benefits both from the efficiency of planning with longer-term subgoals and from the reliability of shorter-term ones. AdaSubS prioritizes further distances, retracting to shorter ranges only when stuck. Additionally, we introduce a verifier network, which assesses whether the proposed subgoal is valid and reachable. The verifier makes it possible to efficiently discard faulty subgoals, which are common and more costly to detect in longer horizons. AdaSubS is a data-driven algorithm whose key components are implemented as learnable deep models. In most cases, we use general-purpose transformer architectures to model subgoal generators and the verifier networks. We train those models on offline data. We show the effectiveness of AdaSubS in three challenging domains: Sokoban, Rubik's Cube, and the inequality theorem prover INT (Wu et al., 2021) . AdaSubS significantly surpasses hierarchical planning algorithms and sets a new state-of-the-art on INT. Our main contributions are: 1. We propose Adaptive Subgoal Search (AdaSubS), a new algorithm that adjusts the planning horizon to take into account the varying complexity of the state space. 2. We present a comprehensive study of adaptive methods, showing that they outperform similar algorithms without adaptation. Amongst these, AdaSubS is the best choice across environments and planning budgets. 3. We also observe a strong indication of out-of-distribution generalization. AdaSubS trained on the proof of length 15 in INT (longest considered in the literature so far) retains more than 50% of its performance when the proof length is increased two-fold. The code of our method is available at https://github.com/AdaptiveSubgoalSearch/adaptive_subs.

2. RELATED WORK

The combination of planning algorithms with deep learning is an active area of research. It provided impressive results e.g., in automated theorem proving (Polu & Sutskever, 2020), chess and Go (Silver et al., 2017) , Atari benchmark (Schrittwieser et al., 2019), and video compression (Mandhane et al., 2022) . In the field of hierarchical planning, the majority of deep-learning-based methods have focused on visual domains (Kim et al., 2019; Pertsch et al., 2020a; Jayaraman et al., 2019; Fang et al., 2019) or on landmark-based navigation methods (Liu et al., 2020a; Gao et al., 2017; Zhang et al., 2020) . This body of work often relies on variational autoencoders for the compression of visual observations and uses planning mechanisms suitable for continuous control settings. 2021) generate macro-actions that help to speed-up the search. This differs from our work, as we use learning to generate subgoals (as opposed to action sequences) and the process is agnostic with respect to the size of the action space. These works have been shown to work on domains with limited combinatorial complexity.



There exist many approaches to hierarchical planning utilizing different temporal distances. Kim et al. (2019) and Pertsch et al. (2020b) use hierarchical variational models to learn the temporal structure of tasks by reconstructing the visual state sequences. Pertsch et al. (2020a); Parascandolo et al. (2020); Jurgenson et al. (2020) recursively construct a plan by generating subgoals in the middle between the existing ones. Allen et al. (

