FORWARD PREDICTION FOR PHYSICAL REASONING

Abstract

Physical reasoning requires forward prediction: the ability to forecast what will happen next given some initial world state. We study the performance of stateof-the-art forward-prediction models in the complex physical-reasoning tasks of the PHYRE benchmark (Bakhtin et al., 2019). We do so by incorporating models that operate on object or pixel-based representations of the world into simple physical-reasoning agents. We find that forward-prediction models can improve physical-reasoning performance, particularly on complex tasks that involve many objects. However, we also find that these improvements are contingent on the test tasks being small variations of train tasks, and that generalization to completely new task templates is challenging. Surprisingly, we observe that forward predictors with better pixel accuracy do not necessarily lead to better physical-reasoning performance. Nevertheless, our best models set a new state-of-the-art on the PHYRE benchmark.

1. INTRODUCTION

When presented with a picture of a Rube Goldberg machine, we can predict how the machine works. We do so by using our intuitive understanding of concepts such as force, mass, energy, collisions, etc., to imagine how the machine state would evolve once released. This ability allows us to solve real world physical-reasoning tasks, such as how to hit a billiards cue such that the ball ends up in the pocket, or how to balance the weight of two children on a see-saw. In contrast, physical-reasoning abilities of machine-learning models have largely been limited to closed domains such as predicting dynamics of multi-body gravitational systems (Battaglia et al., 2016) , stability of block towers (Lerer et al., 2016) , or physical plausibility of observed dynamics (Riochet et al., 2018) . In this work, we explore the use of imaginative, forward-prediction approaches to solve complex physical-reasoning puzzles. We study modern object-based (Battaglia et al., 2016; Sanchez-Gonzalez et al., 2020; Watters et al., 2017) and pixel-based (Finn et al., 2016; Ye et al., 2019; Hafner et al., 2020) forward-prediction models in simple search-based agents on the PHYRE benchmark (Bakhtin et al., 2019) . PHYRE tasks involve placing one or two balls in a 2D world, such that the world reaches a state with a particular property (e.g., two balls are touching) after being played forward. PHYRE tasks are very challenging because small changes in the action (or the world) can have a very large effect on the efficacy of an action; see Figure 1 for an example. Moreover, PHYRE tests models' ability to generalize to completely new physical environments at test time, a significantly harder task than prior work that mostly varies number or properties of objects in the same environment. As a result, physical-reasoning agents may struggle even when their forward-prediction model works well. Nevertheless, our best agents substantially outperform the prior state-of-the-art on PHYRE. Specifically, we find that forward-prediction models can improve the performance of physical-reasoning agents when the models are trained on tasks that are very similar to the tasks that need to be solved at test time. However, we find forward-prediction based agents struggle to generalize to truly unseen tasks, presumably, because small deviations in forward predictions tend to compound over time. We also observe that better forward prediction does not always lead to better physical-reasoning performance on PHYRE (c.f. Buesing et al. (2018) for similar observations in RL). In particular, we find that object-based forward-prediction models make more accurate forward predictions but pixel-based models are more helpful in physical reasoning. This observation may be the result of two key advantages of models using pixel-based state representations. First, it is easier to determine whether a task is solved in a pixel-based representation than in an object-based one, in fully observable 2D environments like PHYRE. Second, pixel-based models facilitate end-to-end training of the

