Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search FAST AND PRECISE: ADJUSTING PLANNING HORIZON WITH ADAPTIVE SUBGOAL SEARCH

Abstract

Complex reasoning problems contain states that vary in the computational cost required to determine the right action plan. To take advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. To this end, AdaSubS generates diverse sets of subgoals at different distances. A verification mechanism is employed to filter out unreachable subgoals swiftly, making it possible to focus on feasible further subgoals. In this way, AdaSubS benefits from the efficiency of planning with longerterm subgoals and the fine control with shorter-term ones, and thus scales well to difficult planning problems. We show that AdaSubS significantly surpasses hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik's Cube, and the inequality-proving benchmark INT.

1. INTRODUCTION

When solving hard problems, people often try to decompose them into smaller parts that are typically easier to complete (Hollerman et al., 2000) . Similarly, subgoal search methods aim to solve complex tasks by considering intermediate subgoals leading towards the main goal. Besides their intuitive appeal, such approaches offer many practical advantages. Most notably, they enable deeper search within a smaller computational budget and reduce the negative impact of approximation errors. Subgoal search methods powered by deep learning have shown promising results for continuous control tasks, such as robotic arm manipulation (Nair & Finn, 2020; Jayaraman et al., 2019; Fang et al., 2019) and navigation (Kim et al., 2019; Savinov et al., 2018) . Recently, Czechowski et al. (2021) showed that the usage of a subgoal generator can significantly improve search efficiency on discrete domains with high combinatorial complexity. This paper uses Czechowski et al. ( 2021) as a starting point and pushes forward, building upon the following observation: many complex reasoning problems contain states that vary in complexity, measured by the computational cost required to determine the right action plan. To illustrate this, imagine driving a car. When traversing a narrow, winding street, it is crucial to focus on the closest events: the next turn, the next car to avoid, etc. However, after entering a straight, empty street, it

