LEARNING NOT TO LEARN: NATURE VERSUS NURTURE IN SILICO

Abstract

Animals are equipped with a rich innate repertoire of sensory, behavioral and motor skills, which allows them to interact with the world immediately after birth. At the same time, many behaviors are highly adaptive and can be tailored to specific environments by means of learning. In this work, we use mathematical analysis and the framework of memory-based meta-learning (or 'learning to learn') to answer when it is beneficial to learn such an adaptive strategy and when to hardcode a heuristic behavior. We find that the interplay of ecological uncertainty, task complexity and the agents' lifetime has crucial effects on the meta-learned amortized Bayesian inference performed by an agent. There exist two regimes: One in which meta-learning yields a learning algorithm that implements task-dependent information-integration and a second regime in which meta-learning imprints a heuristic or 'hard-coded' behavior. Further analysis reveals that non-adaptive behaviors are not only optimal for aspects of the environment that are stable across individuals, but also in situations where an adaptation to the environment would in fact be highly beneficial, but could not be done quickly enough to be exploited within the remaining lifetime. Hard-coded behaviors should hence not only be those that always work, but also those that are too complex to be learned within a reasonable time frame.

1. INTRODUCTION

The 'nature versus nurture' debate (e.g., Mutti et al., 1996; Tabery, 2014) -the question of which aspects of behavior are 'hard-coded' by evolution, and which are learned from experience -is one of the oldest and most controversial debates in biology. Evolutionary principles prescribe that hard-coded behavioral routines should be those for which there is no benefit in adaptation. This is believed to be the case for behaviors whose evolutionary advantage varies little among individuals of a species. Mating instincts or flight reflexes are general solutions that rarely present an evolutionary disadvantage. On the other hand, features of the environment that vary substantially for individuals of a species potentially ask for adaptive behavior (Buss, 2015) . Naturally, the same principles should not only apply to biological but also to artificial agents. But how can a reinforcement learning agent differentiate between these two behavioral regimes? A promising approach to automatically learn rules of adaptation that facilitate environment-specific specialization is meta-learning (Schmidhuber, 1987; Thrun & Pratt, 1998) . At its core lies the idea of using generic optimization methods to learn inductive biases for a given ensemble of tasks. In this approach, the inductive bias usually has its own set of parameters (e.g., weights in a recurrent network; Hochreiter et al., 2001) that are optimized on the whole task ensemble, that is, on a long, 'evolutionary' time scale. These parameters in turn control how a different set of parameters (e.g., activities in the network) are updated on a much faster time scale. These rapidly adapting parameters then allow the system to adapt to a specific task at hand. Notably, the parameters of the system that are subject to 'nature' -i.e., those that shape the inductive bias and are common across tasks -and those that are subject to 'nurture' are usually predefined from the start. In this work, we use the memory-based meta-learning approach for a different goal, namely to acquire a qualitative understanding of which aspects of behavior should be hard-coded and which should be adaptive. Our hypothesis is that meta-learning can not only learn efficient learning algorithms, but can also decide not to be adaptive at all, and to instead apply a generic heuristic to the whole 1

