PLANSFORMER: GENERATING SYMBOLIC PLANS USING TRANSFORMERS

Abstract

Large Language Models (LLMs) have been the subject of active research, significantly advancing the field of Natural Language Processing (NLP). From BERT to BLOOM, LLMs have surpassed state-of-the-art results in various natural language tasks such as question answering, summarization, and text generation. Many ongoing efforts focus on understanding LLMs' capabilities, including their knowledge of the world, syntax, and semantics. However, extending the textual prowess of LLMs to symbolic reasoning has been slow and predominantly focused on tackling problems related to the mathematical field. In this paper, we explore the use of LLMs for automated planning -a branch of AI concerned with the realization of action sequences (plans) to achieve a goal, typically executed by intelligent agents, autonomous robots, and unmanned vehicles. We introduce Plansformer 1 ; an LLM fine-tuned on planning problems and capable of generating plans with favorable behavior in terms of correctness and length with reduced knowledge-engineering efforts. We also demonstrate the adaptability of Plansformer in solving different planning domains with varying complexities, owing to the transfer learning abilities of LLMs. For one configuration of Plansformer, we achieve 97% valid plans, out of which 95% are optimal for Towers of Hanoi -a puzzle-solving domain.

1. INTRODUCTION

Large Language Models (LLMs), based on transformer-based (neural) architecture (Vaswani et al., 2017; Devlin et al., 2018; Brown et al., 2020; Chowdhery et al., 2022) , have significantly advanced the field of Natural Language Processing (NLP). Their employment has grown dramatically in recent times (Li, 2022) , as researchers develop newer and bigger LLMs. From BERT to the most recent BLOOM, language models have surpassed state-of-the-art results in various natural language tasks. For example, PaLM (Chowdhery et al., 2022) , achieved breakthrough performance on a plethora of natural language tasks such as inference, question answering, and commonsense reasoning, and outperformed an average human performance on the BIG-bench benchmark. Despite the textual prowess of LLMs, their significance has been limited in the domains that involve symbols. For example, domains with symbols such as mathematics (Hendrycks et al., 2021b; Cobbe et al., 2021) , and coding problems (Hendrycks et al., 2021a; Chen et al., 2021) deliberates the failures of LLMs when it comes to handling symbols. In Automated Planning, (Valmeekam et al., 2022) suggests that even state-of-the-art LLMs cannot reason with symbolic data and offer a new suite of benchmarks to test their reasoning capabilities. Recently, there has been a lot of interest in LLMs for code generation; for example, CodeT5 (Wang et al., 2021) , CodeBERT (Feng et al., 2020) , Codex (Chen et al., 2021) , etc. In this paper, we propose to employ LLMs that are trained to generate code and repurpose them to generate valid plans. To advance the research in LLM-based automated planning, we create a training and test dataset for several planning domains. We use CodeT5 (base), a transformer-based code generation model that achieves state-of-the-art results in CodeXGlue, as the pre-trained LLM. We select CodeT5 due to its ability to generate goal-directed, sequential instruction and semantically meaningful program codes with syntactic and structural constraints. Then, we present, Plansformer, an LLM trained to generate symbolic plans of high quality in terms of correctness and length. Our experimental results indicate that the syntactic/symbolic knowledge learned from different programming languages in the CodeT5 model can be beneficial for the PDDL-based automated planning task. For example, in one configuration of Plansformer tested on a puzzle-solving domain, Towers of Hanoi -hanoi, our model was able to generate 97% valid plans, out of which 95% are shortest length plans. The results reveal a promising direction to harness LLMs for symbolic tasks such as planning. In the remainder of the paper, we present preliminaries on automated planning and language models and then propose an LLM repurposed planner called Plansformer. Next, we present the experimental results comparing our approach with state-of-the-art planners and other large language models. Furthermore, we demonstrate the ability of Plansformer to adapt to other domains and discuss the relevance to instruction generation. We conclude with a discussion of the results and presentation of ongoing work.

2. BACKGROUND

2.1 AUTOMATED PLANNING Given the initial and goal states, alongside a set of legal actions, the objective of a planning agent is to devise a sequence of actions that advance the agent from the initial to the goal state. This paper adopts the Planning Domain Description Language (PDDL) (McDermott et al., 1998; Fox & Long, 2003) notations. In PDDL, a planning environment is described in terms of objects in the world, predicates that describe relations between these objects, and actions that modify the world by manipulating these relations. The output plan consists of a series of time steps, each of which can have one or more instantiated actions with concurrency semantics (Ghallab et al., 2004) . A planner devises plans by searching in the space of states, where a state is a configuration of physical objects or partial plans. There is a single agent in the most basic formulation, called classical planning. The actions have unit cost, take constant time to execute, have deterministic effects, with the fully observable world, domain-specific conditions/constraints, and all goals have to be achieved (Ghallab et al., 2004) . In more sophisticated planning settings, many of these conditions are relaxed. There may be multiple agents, and the cost and duration of actions can be non-uniform. At the same time, its effects can be non-deterministic, the world can be partially observable, and the agent may maximize as many goals as it can achieve in a given time and resource budget.

2.2. LARGE LANGUAGE MODELS AND SYMBOLIC TASKS

Large Language Models (LLM) such as BERT (Wolf et al., 2020 ), RoBERTa (Liu et al., 2019 ), and GPT3 (Brown et al., 2020) are pre-trained on extensive unstructured knowledge from public data such as Wikipedia, Bookcorpus, and Commoncrawl, have shown impressive results in several NLP tasks. It demonstrated the ability to generalize to multiple tasks from question answering and machine translation to story generation and instruction following (Wang et al., 2018; 2019) . LLMs have shown the ability to generate output in natural language (Wolf et al., 2020; Raffel et al., 2020) , adapt to novel tasks in a zero or few-shot approach (Brown et al., 2020; Radford et al., 2019) and decode with constraints on output space (Hokamp & Liu, 2017; Welleck et al., 2019; Kumar et al., 2021) . Recent progress in LLMs has demonstrated the generation of structured output that requires precise syntactic/symbolic knowledge with structural constraints such as knowledge graphs (Petroni et al., 2019 ), protein structure (Unsal et al., 2022; Ferruz & Höcker, 2022) , and programming languages (Ahmad et al., 2021) . As the LLMs collect the related knowledge necessary to solve an NLP task, (Petroni et al., 2019) have shown that the LLMs are potential representations of the significant knowledge bases. In protein data (Unsal et al., 2022; Ferruz & Höcker, 2022) , LLMs generate the functional properties of the proteins by enforcing the structural constraints specific to protein science and determining the complex functional relationships in the protein binding. Code generation has recently become very popular in the LLM research community. Several models such as CodeBERT (Feng et al., 2020 ), Codex (Chen et al., 2021 ), and CodeT5 (Wang et al., 2021) have shown significant improvement in transfer from pre-trained models for natural language to structured codes. One of the key contributors to the success of these LLMs in code generation is fine-tuning the models on task-specific data. For instance, CodeXGlue (Lu et al., 2021) , a benchmark dataset for code understanding and generation with sample codes from several programming languages, is used to fine-tune CodeBERT, CodeT5, and others. In this paper, we harness CodeT5 for further fine-tuning to the classical automated planning domain due to its ability to generate goal-directed, sequential instruction and semantically meaningful program codes with syntactic and structural constraints.



The dataset, code, and all fine-tuned model checkpoints will be released for public usage.

