MULTI-TASK OPTION LEARNING AND DISCOVERY FOR STOCHASTIC PATH PLANNING

Abstract

This paper addresses the problem of reliably and efficiently solving broad classes of long-horizon stochastic path planning problems. Starting with a vanilla RL formulation with a stochastic dynamics simulator and an occupancy matrix of the environment, our approach computes useful options with policies as well as high-level paths that compose the discovered options. Our main contributions are (1) data-driven methods for creating abstract states that serve as end points for helpful options, (2) methods for computing option policies using auto-generated option guides in the form of dense pseudo-reward functions, and (3) an overarching algorithm for composing the computed options. We show that this approach yields strong guarantees of executability and solvability: under fairly general conditions, the computed option guides lead to composable option policies and consequently ensure downward refinability. Empirical evaluation on a range of robots, environments, and tasks shows that this approach effectively transfers knowledge across related tasks and that it outperforms existing approaches by a significant margin.

1. INTRODUCTION

Autonomous robots must compute long-horizon motion plans (or path plans) to accomplish their tasks. Robots use controllers to execute these motion plans by reaching each point in the motion plan. However, the physical dynamics can be noisy and controllers are not always able to achieve precise trajectory targets. This prevents robots from deterministically reaching a goal while executing the computed motion plan and increases the complexity of the motion planning problem. Several approaches (Schaul et al., 2015; Pong et al., 2018) have used reinforcement learning (RL) to solve multi-goal stochastic path planning problems by learning goal-conditioned reactive policies. However, these approaches work only for short-horizon problems (Eysenbach et al., 2019) . On the other hand, multiple approaches have been designed for handling stochasticity in motion planning (Alterovitz et al., 2007; Sun et al., 2016) , but they require discrete actions for the robot. This paper addresses the following question: Can we develop effective approaches that can efficiently compute plans for long-horizon continuous stochastic path planning problems? In this paper, we show that we can develop such an approach by learning abstract states and then learning options that serve as actions between these abstract states. Abstractions play an important role in long-horizon planning. Temporally abstracted high-level actions reduce the horizon of the problem in order to reduce the complexity of the overall decisionmaking problem. E.g., a task of reaching a location in a building can be solved using abstract actions such as "go from room A to corridor A", "reach elevator from corridor A", etc., if one can automatically identify these regions of saliency. Each of these actions is a temporally abstracted action. Not only do these actions reduce the complexity of the problem, but they also allow the transfer of knowledge across multiple tasks. E.g, if we learn how to reach room B from room A for a task, we can reuse the same solution when this abstract action is required to solve some other task. Reinforcement learning allows learning policies that account for the stochasticity of the environment. Recent work (Lyu et al., 2019; Yang et al., 2018; Kokel et al., 2021) has shown that combining RL with abstractions and symbolic planning has enabled robots to solve long-horizon problems that require complex reasoning. However, these approaches require hand-coded abstractions. In this paper,

