NEURAL REPRESENTATIONS IN MULTI-TASK LEARNING GUIDED BY TASK-DEPENDENT CONTEXTS

Abstract

The ability to switch between tasks effectively in response to external stimuli is a hallmark of cognitive control. Our brain is able to filter and integrate external information to accomplish goal-directed behavior. Task switching occurs rapidly and efficiently, allowing us to perform multiple tasks with ease. In a similar way, artificial neural networks can be tailored to exhibit multi-task capabilities and achieve high performance across domains. In terms of explainability, understanding how neural networks make predictions is crucial in many real-world applications, for instance, in guiding clinical decisions. In this study, we delve into neural representations learned by multi-tasking architectures. Concretely, we compare individual and parallel networks with taskswitching networks. Task-switching networks leverage task-dependent contexts to learn disentangled representations without hurting the overall task accuracy. We show that task-switching networks operate in an intermediate regime between individual and parallel networks. In addition, we show that shared representations are produced by the emergence neurons encoding multiple tasks. Furthermore, we study the role of contexts across network processing and show its role at aligning the task with the relevant features. Finally, we investigate how the magnitude of contexts affects the performance in task-switching networks.

1. INTRODUCTION

Living involves constantly gauging and selecting the optimal task to perform. This decision is the result of the interaction of different elements, such as our current goals, external circumstances, or the stimulus context (Monsell, 2003) . In the brain, task switching and cognitive control has been associated principally with the prefrontal cortex (PFC), which provides top-down regulation to other cortical brain areas Johnston et al. (2007) . In this sense, PFC controls the activation of multiple neural pathways that are activated or deactivated to ultimately result in task execution. Neural pathways are formed by collections of neurons that cooperate to yield a specific effect. Although single neurons have been historically regarded as centre of processing in the brain, now we are moving towards a framework in which neural networks conform the functional processing unit of the nervous system Yuste (2015). In the visual cortex, where subpopulations of neurons are responsible of the processing of different visual field features for correct object recognition DiCarlo et al. (2012) . According to the gating theory (Miller et al., 2001) , context-dependent top-down inputs from the PFC regulate the activation of neural pathways. The encoding of relationships between stimulus and contexts can be facilitated by tuning neural activity to multiple tasks. In the PFC of monkeys, it has been observed that single neurons exhibit nonlinear responses to multiple stimuli (Rigotti et al., 2013) . This behavior, or mixed selectivity, favors high-dimensional representation of neural activation, which allows linear readouts to generate a vast number of responses, in opposition to low-dimensional representations (Fusi et al., 2016) . Recently, artificial neural networks have been revisited as models of neural computation, many findings suggesting their practicality for assessing brain theories Richards et al. (2019) . For example, the internal representations learned by neural networks have been associated with representations in the brain in multi-tasking settings (Ito et al., 2022; Ito & Murray, 2021; Flesch et al., 2022a; b) . In this paper, we investigate the neural representations learned by feedforward multi-tasking architectures. Neural networks have been designed to be capable of processing multiple tasks in parallel, which is beneficial for the network performance (Caruana, 1997) to achieve high performance across domains Ruder (2017). Here, we focus on neural networks using contexts to switch attention between tasks. We use population analysis tools to investigate how neural computations are associated with task stimulus Kriegeskorte et al. (2008) ; Jazayeri & Ostojic (2021) and describe the advantages of learning representations using task-switching networks. Contexts have been used to alleviate catastrophic forgetting in continual learning. Masse et al. (2018) showed that adding context to parameter stabilization methods improved accuracy when compared to parameter stabilization alone. Serra et al. ( 2018) implemented gating by using task-based attention mechanisms to help preserving information of previous tasks without compromising learning new tasks. In Grewal et al. (2021) , authors combined synaptic intelligence with active dendrites and sparse activations to reduce catastrophic forgetting. The idea of recruiting multiple subnetworks for continual learning was previously explored in Wortsman et al. (2020) . In addition, Li et al. (2016) proposed multi-bias non-linear activations to improve feature extraction in convolutional neural networks, and for semantic segmentation, task-switching networks using task embeddings were introduced by (Sun et al., 2021) to promote learning common parameters across the tasks using the same network.

1.2. OUR CONTRIBUTIONS

The main contributions of this paper are two-fold: 1. Firstly, we investigate the representations learned by three different variations of feedforward networks. We find that task-switching network operate in an intermediate regime between individual and parallel networks and the performance on all tasks is improved when parameter sharing is present (section 3.1). 2. Secondly, we expand previous analyses involving multi-task learning with contexts and mixed selectivity (section 3.3) and report new findings on the impact of contexts location and magnitude at different stages of processing (sections 3.2, 3.4, 3.5).

2.1. ARCHITECTURES

We conducted different experiments with three variants of the feedforward network (Rumelhart et al., 1986 ), (Goodfellow et al., 2016) in multi-task learning: 1. Individual Networks: Each task is learned by an independent network. Multi-tasking is performed by combining the outputs of the networks. An individual network is parameterized as y t = f t (x; θ t ), where t ∈ T denotes the task in the set T = {T A , T B , ..., T N } of N tasks, θ t denotes the weights and biases specific for each task, W t and b t for each layer. Parameters here are independent, hence not shared across tasks.



RELATED WORK Neural networks with contexts have been used as models to study cognitive control in machines and humans. Mante et al. (2013) reproduced PFC dynamics from monkeys using recurrent neural networks with sensory contexts. Similarly, Ardid & Wang (2013) analyzed task-switching behavior effects, such as switch and congruency, emerging from network attractor dynamics. In Musslick et al. (2017), authors studied the learning efficiency of neural networks in multi-tasking with contexts and tasks with ranging degrees of overlapping. Flesch et al. (2022a) analyzed the geometry of representations learned by neural networks and humans in a task-switching schedule. Later, Flesch et al. (2022b), modified the stochastic gradient descent algorithm to strengthen task-relevant features in a continual learning setting. More recently, Ito et al. (2022) studied the generalization of new tasks by composing old tasks using different rule-based contexts. In Ito & Murray (2021), authors used a neural network to investigate the transformation mapping between visual and motor representations occurring in the brain. They used representational similarity analysis to study the geometry of neural codes in multi-tasking Kriegeskorte et al. (2008); Kriegeskorte & Kievit (2013).

