LEARNING EXPLANATIONS THAT ARE HARD TO VARY

Abstract

In this paper, we investigate the principle that good explanations are hard to vary in the context of deep learning. We show that averaging gradients across examples -akin to a logical OR (_) of patterns -can favor memorization and 'patchwork' solutions that sew together different strategies, instead of identifying invariances. To inspect this, we first formalize a notion of consistency for minima of the loss surface, which measures to what extent a minimum appears only when examples are pooled. We then propose and experimentally validate a simple alternative algorithm based on a logical AND (^), that focuses on invariances and prevents memorization in a set of real-world tasks. Finally, using a synthetic dataset with a clear distinction between invariant and spurious mechanisms, we dissect learning signals and compare this approach to well-established regularizers.



Consider the top of Figure 1 , which shows a view from above of the loss surface obtained as we vary a two dimensional parameter vector θ " pθ 1 , θ 2 q, for a fictional dataset containing two observations x A and x B . Note the two global minima on the top-right and bottom-left. Depending on the initial values of θ -marked as white circles -gradient descent converges to one of the two minima. Judging solely by the value of the loss function, which is zero in both cases, the two minima look equally good. However, looking at the loss surfaces for x A and x B separately, as shown below, a crucial difference between those two minima appears: Starting from the same initial parameter configurations and following the gradient of the loss, ∇ θ Lpθ, x i q, the probability of finding the same minimum on the top-right in either case is zero. In contrast, the minimum in the lower-left corner has a significant overlap across the two loss surfaces, so gradient descent can converge to it even if training on x A (or x B ) only. Note that after averaging there is no way to tell what the two loss surfaces looked like: Are we destroying information that is potentially important? In this paper, we argue that the answer is yes. In particular, we hypothesize that if the goal is to find invariant mechanisms in the data, these can be identified by finding explanations (e.g. model parameters) that are hard to vary across examples. A notion of invariance implies something that stays the same, as something else changes. We assume that data comes from different environments: An invariant mechanism is shared across all, generalizes out of distribution (o.o.d.) , but might be hard to model; each environment also has spurious explanations that are easy to spot ('shortcuts'), but do not generalize o.o.d. From the point of view of causal modeling, such invariant mechanisms can be interpreted as conditional distributions of the targets given causal features of the inputs; invariance of such conditionals is expected if they represent causal mechanisms, that is -stable properties of the physical world (see e.g. Hoover (1990) ). Generalizing o.o.d. means therefore that the predictor should perform equally well on data coming from different settings, as long as they share the causal mechanisms. We formalize a notion of consistency, which characterizes to what extent a minimum of the loss surface appears only when data from different environments are pooled. Minima with low consistency are 'patchwork' solutions, which (we hypothesize) sew together different strategies and should not be expected to generalize to new environments. An intuitive description of this principle was proposed by physicist David Deutsch: "good explanations are hard to vary" (Deutsch, 2011) . Using the notion of consistency, we define Invariant Learning Consistency (ILC), a measure of the expected consistency of the solution found by a learning algorithm on a given hypothesis class. The ILC can be improved by changing the hypothesis class or the learning algorithm, and in the last part of the paper we focus on the latter. We then analyse why current practices in deep learning provide little incentive for networks to learn invariances, and show that standard training is instead set up with the explicit objective of greedily maximizing speed of learning, i.e., progress on the training loss. When learning "as fast as possible" is not the main objective, we show we can trade-off some "learning speed" for prioritizing learning the invariances. A practical instantiation of ILC leads to o.o.d. generalization on a challenging synthetic task where several established regularizers fail to generalize; moreover, following the memorization task from Zhang et al. (2017) , ILC prevents convergence on CIFAR-10 with random labels, as no shared mechanism is present, and similarly when a portion of training labels is incorrect. Lastly, we set up a behavioural cloning task based on the game CoinRun (Cobbe et al., 2019b) , and observe better generalization on new unseen levels.  302 8 0Z0ZkZ 0Z 7 Z0Z0Op Z0 6 0Z0O0Z 0Z 5 Z0Z0Z0 Z0 4 0Z0Z0Z 0Z 3 Z0Z0ZQ Z0 2 0J0Z0Z 0Z 1 Z0Z0l0 Z0 a b c d e f g h 303 8 0Z0ZkZ 0Z 7 Z0Z0Z0 Z0 6 0Z0Z0O NZ 5 Z0ZQZ0 Z0 4 0Z0Z0Z 0Z 3 Z0Z0Zp Z0 2 0Z0ZnZ 0J 1 Z0Z0Zq Z0 a b c d e f g h 304 8 0Z0Z0s 0Z 7 j0Z0Z0 Z0 6 No0Z0Z 0Z 5 Z0Z0Z0 Zp 4 0O0ZbZ pO 3 Z0Z0Or Z0 2 RZ0Z0O KA 1 Z0Z0Z0 Z0 a b c d e f g h 305 8 0Z0Z0Z 0Z 7 Z0Z0Z0 Z0 6 0Z0Z0Z 0Z 5 Z0Z0Z0 Z0 4 0Z0Z0Z 0Z 3 Z0Z0ZN Z0 2 0orZ0Z PO 1 s0j0J0 ZR a b c d e f g h 306 8 ra0Z0Z 0Z 7 j0o0Z0 ZR 6 PZPZ0Z 0Z 5 OpJ0Z0 Z0 4 0Z0Z0Z 0Z 3 Z0Z0Z0 Z0 2 0Z0Z0Z 0Z 1 Z0Z0ZB Z0 a b c d e f g h 5334 Problems, Combinations & Games 2.1 White to Move #2 355 8 0Z0Z0Z0Z 7 Z0Z0Z0Z0 6 0Z0Z0Z0Z 5 Z0Z0ZQZ0 4 0Z0Z0mBZ 3 Z0A0j0Z0 2 0Z0Z0Z0Z 1 Z0Z0ZKZ0 a b c d e f g h 356 8 0Z0Z0Z0Z 7 Z0Z0Z0Z0 6 0Z0Z0Z0Z 5 Z0Z0Z0Z0 4 0Z0o0ZNZ 3 Z0ZKZ0Z0 2 0Z0Z0ZpL 1 Z0Z0ZkZ0 a b c d e f g h 357 8 0Z0Z0ZRZ 7 Z0Z0ZKm0 6 0Z0Z0Z0Z 5 Z0Z0Z0Z0 4 0Z0Z0Z0Z 3 Z0Z0Z0S0 2 0Z0Z0Z0Z 1 Z0Z0Z0Ak a b c d e f g h 358 8 0Z0ZKZ0Z 7 Z0Z0Z0Z0 6 0ZpZkZ0Z 5 Z0Z0O0Z0 4 0Z0OQZ0Z 3 Z0Z0Z0Z0 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0 a b c d e f g h 359 8 0Z0Z0Z0Z 7 jPO0Z0Z0 6 0SnZ0Z0Z 5 Z0J0Z0Z0 4 0Z0Z0Z0Z 3 Z0Z0Z0Z0 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0 a b c d e f g h 360 8 0Z0Z0ArZ 7 Z0Z0Z0O0 6 0Z0Z0JBj 5 Z0Z0Z0Z0 4 0Z0Z0Z0Z 3 Z0Z0Z0Z0 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0 a b c d e f g h Kg 2 5 3 3 4 P r o b l e m s , C o m b i n a t i o n s & G a m e s 1 . 1 M a t e i n 1 3 0 1 8 r Z b Z k Z 0 s 7 Z p o p Z 0 Z p 6 p Z 0 Z p L 0 Z 5 Z N O 0 Z 0 Z 0 4 0 O K Z 0 Z 0 Z 3 O 0 Z 0 O n Z P 2 0 Z 0 Z 0 O 0 Z 1 Z 0 Z 0 Z 0 Z 0 a b c d e f g h 3 0 2 8 0 Z 0 Z k Z 0 Z 7 Z 0 Z 0 O p Z 0 6 0 Z 0 O 0 Z 0 Z 5 Z 0 Z 0 Z 0 Z 0 4 0 Z 0 Z 0 Z 0 Z 3 Z 0 Z 0 Z Q Z 0 2 0 J 0 Z 0 Z 0 Z 1 Z 0 Z 0 l 0 Z 0 a b c d e f g h 3 0 3 8 0 Z 0 Z k Z 0 Z 7 Z 0 Z 0 Z 0 Z 0 6 0 Z 0 Z 0 O N Z 5 Z 0 Z Q Z 0 Z 0 4 0 Z 0 Z 0 Z 0 Z 3 Z 0 Z 0 Z p Z 0 2 0 Z 0 Z n Z 0 J 1 Z 0 Z 0 Z q Z 0 a b c d e f g h 3 0 4 8 0 Z 0 Z 0 s 0 Z 7 j 0 Z 0 Z 0 Z 0 6 N o 0 Z 0 Z 0 Z 5 Z 0 Z 0 Z 0 Z p 4 0 O 0 Z b Z p O 3 Z 0 Z 0 O r Z 0 2 R Z 0 Z 0 O K A 1 Z 0 Z 0 Z 0 Z 0 a b c d e f g h 3 0 5 8 0 Z 0 Z 0 Z 0 Z 7 Z 0 Z 0 Z 0 Z 0 6 0 Z 0 Z 0 Z 0 Z 5 Z 0 Z 0 Z 0 Z 0 4 0 Z 0 Z 0 Z 0 Z 3 Z 0 Z 0 Z N Z 0 2 0 o r Z 0 Z P O 1 s 0 j 0 J 0 Z R a b c d e f g h 3 0 6 8 r a 0 Z 0 Z 0 Z 7 j 0 o 0 Z 0 Z R 6 P Z P Z 0 Z 0 Z 5 O p J 0 Z 0 Z 0 4 0 Z 0 Z 0 Z 0 Z 3 Z 0 Z 0 Z 0 Z 0 2 0 Z 0 Z 0 Z 0 Z 1 Z 0 Z 0 Z B Z 0 a b c d e f g h B f8 5334 Problems, Combinations & Games 2.1 White to Move #2 421 8 rZkZNZ0Z 7 oRZRZ0Z0 6 KZ0Z0Z0Z 5 Z0Z0ZnZ0 4 0Z0Z0Z0Z 3 Z0Z0Z0Z0 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0 a b c d e f g h 422 8 0Z0Z0Z0Z 7 ZNZ0Z0Z0 6 RZpZ0Z0Z 5 ZkZ0Z0Z0 4 0MpZ0Z0Z 3 Z0J0Z0Z0 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0 a b c d e f g h 423 8 0Z0Z0Z0Z 7 Z0Z0Z0Z0 6 0Z0Z0Z0Z 5 Z0Z0ZNZB 4 0Z0Z0Z0Z 3 Z0Z0o0ok 2 0Z0ZRZ0Z 1 Z0Z0Z0J0 a b c d e f g h 424 8 0Z0Z0Z0Z 7 Z0o0Z0S0 6 0ZRZNZ0j 5 Z0Z0Z0Z0 4 0Z0Z0Z0o 3 Z0Z0Z0ZK 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0 a b c d e f g h 425 8 0Z0Z0Z0Z 7 Z0Z0Z0Zp 6 0Z0Z0Z0L 5 Z0Z0Z0Z0 4 0Z0ZKZko 3 Z0Z0Z0Z0 2 0Z0Z0ZPZ 1 Z0Z0ZNZ0 a b c d e f g h 426 8 0Z0Z0Z0Z 7 Z0ZRZ0Z0 6 0o0Z0Z0Z 5 ZkZpZ0Z0 4 0Z0O0Z0Z 3 Z0J0Z0Z0 2 QZ0Z0Z0Z 1 Z0Z0Z0Z0 a b c d e f g h 5334 Problems, Combinations & Games 2.1 White to Move #2 421 8 rZkZN Z0Z 7 oRZRZ 0Z0 6 KZ0Z0 Z0Z 5 Z0Z0Z nZ0 4 0Z0Z0 Z0Z 3 Z0Z0Z 0Z0 2 0Z0Z0 Z0Z 1 Z0Z0Z 0Z0 a b c d e f g h 422 8 0Z0Z0 Z0Z 7 ZNZ0Z 0Z0 6 RZpZ0 Z0Z 5 ZkZ0Z 0Z0 4 0MpZ0 Z0Z 3 Z0J0Z 0Z0 2 0Z0Z0 Z0Z 1 Z0Z0Z 0Z0 a b c d e f g h 423 8 0Z0Z0 Z0Z 7 Z0Z0Z 0Z0 6 0Z0Z0 Z0Z 5 Z0Z0Z NZB 4 0Z0Z0 Z0Z 3 Z0Z0o 0ok 2 0Z0ZR Z0Z 1 Z0Z0Z 0J0 a b c d e f g h 424 8 0Z0Z0 Z0Z 7 Z0o0Z 0S0 6 0ZRZN Z0j 5 Z0Z0Z 0Z0 4 0Z0Z0 Z0o 3 Z0Z0Z 0ZK 2 0Z0Z0 Z0Z 1 Z0Z0Z 0Z0 a b c d e f g h 425 8 0Z0Z0 Z0Z 7 Z0Z0Z 0Zp 6 0Z0Z0 Z0L 5 Z0Z0Z 0Z0 4 0Z0ZK Zko 3 Z0Z0Z 0Z0 2 0Z0Z0 ZPZ 1 Z0Z0Z NZ0 a b c d e f g h 426 8 0Z0Z0 Z0Z 7 Z0ZRZ 0Z0 6 0o0Z0 Z0Z 5 ZkZpZ 0Z0 4 0Z0O0 Z0Z 3 Z0J0Z 0Z0 2 QZ0Z0 Z0Z 1 Z0Z0Z 0Z0 a b c d e f g h An example. Take these two second-hand books of chess puzzles. We can learn the two independent shortcuts (blue arrows for the left book OR handwritten solutions on the right), or actually learn to play chess (the invariant mechanism). While both strategies solve other problems from the same books (i.i.d.), only the latter generalises to new chess puzzle books (o.o.d.) . How to distinguish the two? We would not have learned about the red arrows had we trained on the book on the right, and vice versa with the hand-written notes.

2. EXPLANATIONS THAT ARE HARD TO VARY

We consider datasets tD e u ePE , with |E| " d, and D e " px e i , y e i q, i e " 1, . . . , n e . Here x e i P X Ď R m is the vector containing the observed inputs, and y e i P Y Ď R p the targets. The superscript e P E indexes some aspect of the data collection process, and can be interpreted as an environment label. Our objective is to infer a function f : X Ñ Y -which we call mechanism -assigning a target y e i to each input x e i ; as explained in the introduction, we assume that such function is shared across all environments. For estimation purposes, f may be parametrized by a neural network with continuous activations; for weights θ P Θ Ď R n , we denote the neural network output at x P X as f θ pxq. Gradient-based optimization. To find an appropriate model f θ , standard optimizers rely on gradients from a pooled loss function L : R n Ñ R. This function measures the average performance of the neural network when predicting data labels, across all environments: Lpθq :" i qPD e pf px e i ; θq, y e i q; where : R p ˆRp Ñ r0, `8q is usually chosen to be the L2 loss or the cross-entropy loss. The parameter updates according to gradient descent (GD) are given by θ k`1 GD " θ k GD ´η∇Lpθ k GD q, where η ą 0 is the learning rate. Under some standard assumptions (Lee et al., 2016) , pθ k GD q kě0 converges to a local minimizer of L, with probability one. When do we not learn invariances? We start by describing what might prevent learning invariances in standard gradient-based optimization. (i) Training stops once the loss is low enough. If optimization learned spurious patterns by the time it converged, invariances will not be learned anymore. This depends on the rate at which different patterns are learned. The rates at which invariant patterns emerge (and vice-versa, the spurious patterns do not) can be improved by e.g.: (a) careful architecture design, e.g. as done by hardcoding spatial equivariance in convolutional networks; (b) fine-tuning models pre-trained on large amounts of data, where strong features already emerged and can be readily selected. (ii) Learning signals: everything looks relevant for a dataset of size 1. Due to the summation in the definition of the pooled loss L, gradients for each example are computed independently. Informally, each signal is identical to the one for an equivalent dataset of size 1, where every pattern appears relevant to the task. To find invariant patterns across examples, if we compute our training signals on each of them independently, we have to rely on the way these are aggregated. 1 (iii) Aggregating gradients: averaging maximizes learning speed. The default method to pool gradients is the arithmetic mean. GD applied to L is designed to minimize the pooled loss by prioritizing descent speed. 2 Indeed, a step of GD is equivalent to finding a tight 3 quadratic upper bound L to L, and then jumping to the minimizer of this approximation (Nocedal and Wright, 2006) . While speed is often desirable, by construction GD ignores one potentially crucial piece of information: The gradient ∇L is the result of averaging signals ∇L e , which correspond to the patterns visible from each environment at this stage of optimization. In other words, GD with average gradients greedily maximizes for learning speed, but in some situations we would like to trade some convergence speed for invariance. For instance, instead of performing an arithmetic mean between gradients (logical OR), we might want to look towards a logical AND, which can be characterized as a geometric mean. Fig. 1 shows how a sum can be seen as a logical OR: the two orthogonal gradients from data A and data B at (0.5,0.5) point to different directions, yet both are kept in the combined gradient. 4 In Sec. 2.3 we elaborate on this idea and on implementing a logical AND between gradients. Before presenting this discussion, we take some time to better motivate the need for invariant learning consistency and to construct a precise mathematical definition of consistency.

2.1. FORMAL DEFINITION OF ILC

Let Θ Å be the set of convergence points of algorithm A when trained using all environments (pooled data): that is, Θ Å " tθ ˚P Θ | D θ 0 P R n s.t. A 8 pθ 0 , Eq " θ ˚u. For instance, if A is gradient descent, the result of Lee et al. (2016) implies that Θ Å is the set of local minimizers of the pooled loss L. To each θ ˚P Θ Å, we want to associate a consistency score, quantifying the concept "good θ ˚are hard to vary". In other words, we would like the score to capture the consistency of the loss landscape around θ ˚across the different environments. For example, in Fig. 1 the loss landscape near the bottom-left minimizer is consistent across environments, while the top-right minimizer is not. Loss surface for data A Loss surface for data B { ✓ ⇤ < l a t e x i t s h a 1 _ b a s e 6 4 = " n M x v k X 5 o x 3 I w M 1 Z d / / y I b m S I 7 p U = " > A A A C G n i c d V D L S g M x F L 1 T 3 / V V d e k m K I K o l J m 2 Y L s T 3 b h U s F r o j C W T p m 1 o 5 k F y R y h D v 8 O N C 3 / E j Q t F 3 I k b / 8 Z M a 0 F F D w Q O 5 5 y b 3 B w / l k K j b X 9 Y u a n p m d m 5 + Y X 8 4 t L y y m p h b f 1 S R 4 l i v M 4 i G a m G T z W X I u R 1 F C h 5 I 1 a c B r 7 k V 3 7 / J P O v b r j S I g o v c B B z L 6 D d U H Q E o 2 i k V s F J 3 d E l T d X 1 v d Q u 1 g 6 r p V r t I C N 2 u V b O S L V S K l e G L v Y 4 0 u u 9 Y a u w P Y m R S Y x M Y s Q p 2 i N s H 2 2 5 + / c A c N Y q v L n t i C U B D 5 F J q n X T s W P 0 U q p Q M M m H e T f R P K a s T 7 u 8 a W h I A 6 6 9 d L T V k O w Y p U 0 6 k T I n R D J S v 0 + k N N B 6 E P g m G V D s 6 d 9 e J v 7 l N R P s V L 1 U h H G C P G T j h z q J J B i R r C f S F o o z l A N D K F P C 7 E p Y j y r K 0 L S Z N y V M f k r + J 5 e l o m M X n X P T x j G M M Q + b s A W 7 4 M A h H M E p n E E d G N z C A z z B s 3 V n P V o v 1 u s 4 m r O + Z j b g B 6 z 3 T 8 0 7 n x Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " C O H T B S E 8 D o / y l W n L O w F U t q t v 3 5 c = " > A A A C G n i c d V D L S g M x F M 3 4 r P V V d e k m t A i i U m b a Q t t d 0 Y 3 L C v Y B n b F k 0 r Q N z T x I 7 g h l 6 F 8 I b v w V N y 4 U c S d u + j d m W g s q e i B w O O f c 5 O a 4 o e A K T H N q L C 2 v r K 6 t p z b S m 1 v b O 7 u Z v f 2 m C i J J W Y M G I p B t l y g m u M 8 a w E G w d i g Z 8 V z B W u 7 o I v F b t 0 w q H v j X M A 6 Z 4 5 G B z / u c E t B S N 2 P F 9 u y S j h y 4 T m z m q + V K o V o 9 S 4 h Z r B Y T U i k V i q W J D U M G 5 O Z k 0 s 3 k F j G 8 i O F F D F t 5 c 4 Z c L W u f 3 k 1 r 4 3 o 3 8 2 7 3 A h p 5 z A c q i F I d y w z B i Y k E T g W b p O 1 I s Z D Q E R m w j q Y + 8 Z h y 4 t l W E 3 y k l R 7 u B 1 I f H / B M / T 4 R E 0 + p s e f q p E d g q H 5 7 i f i X 1 4 m g X 3 F i 7 o c R M J / O H + p H A k O A k 5 5 w j 0 t G Q Y w 1 I V R y v S u m Q y I J B d 1 m W p e w + C n + n z Q L e c v M W 1 e 6 j X M 0 R w o d o i w 6 R h Y q o x q 6 R H X U Q B T d o 0 f 0 j F 6 M B + P J e D X e 5 t E l 4 2 v m A P 2 A 8 f E J 1 q W g n A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " C O H T B S E 8 D o / y l W n L O w F U t q t v 3 5 c = " > A A A C G n i c d V D L S g M x F M 3 4 r P V V d e k m t A i i U m b a Q t t d 0 Y 3 L C v Y B n b F k 0 r Q N z T x I 7 g h l 6 F 8 I b v w V N y 4 U c S d u + j d m W g s q e i B w O O f c 5 O a 4 o e A K T H N q L C 2 v r K 6 t p z b S m 1 v b O 7 u Z v f 2 m C i J J W Y M G I p B t l y g m u M 8 a w E G w d i g Z 8 V z B W u 7 o I v F b t 0 w q H v j X M A 6 Z 4 5 G B z / u c E t B S N 2 P F 9 u y S j h y 4 T m z m q + V K o V o 9 S 4 h Z r B Y T U i k V i q W J D U M G 5 O Z k 0 s 3 k F j G 8 i O F F D F t 5 c 4 Z c L W u f 3 k 1 r 4 3 o 3 8 2 7 3 A h p 5 z A c q i F I d y w z B i Y k E T g W b p O 1 I s Z D Q E R m w j q Y + 8 Z h y 4 t l W E 3 y k l R 7 u B 1 I f H / B M / T 4 R E 0 + p s e f q p E d g q H 5 7 i f i X 1 4 m g X 3 F i 7 o c R M J / O H + p H A k O A k 5 5 w j 0 t G Q Y w 1 I V R y v S u m Q y I J B d 1 m W p e w + C n + n z Q L e c v M W 1 e 6 j X M 0 R w o d o i w 6 R h Y q o x q 6 R H X U Q B T d o 0 f 0 j F 6 M B + P J e D X e 5 t E l 4 2 v m A P 2 A 8 f E J 1 q W g n A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 L h + C L V 6 b L W C O A m X Y G P x h A O x Z H 0 = " > A A A C G n i c d V D L S g M x F M 3 U V 6 2 v q k s 3 w S K I S J l p C 2 1 3 R T c u K 9 g H t G P J p G k b m n m Q 3 B H K M N / h x l 9 x 4 0 I R d + L G v z E z b U F F D w Q O 5 5 y b 3 B w n E F y B a X 4 a m Z X V t f W N 7 G Z u a 3 t n d y + / f 9 B W f i g p a 1 F f + L L r E M U E 9 1 g L O A j W D S Q j r i N Y x 5 l e J n 7 n j k n F f e 8 G Z g G z X T L 2 + I h T A l o a 5 K 2 o n 1 7 S k 2 P H j s x i v V o r 1 e v n C T H L 9 X J C a p V S u R L 3 Y c K A 3 J 7 F g 3 x h G c P L G F 7 G s F U 0 U x T Q A s 1 B / r 0 / 9 G n o M g + o I E r 1 L D M A O y I S O B U s z v V D x Q J C p 2 T M e p p 6 x G X K j t K t Y n y i l S E e + V I f D 3 C q f p + I i K v U z H V 0 0 i U w U b + 9 R P z L 6 4 U w q t k R 9 4 I Q m E f n D 4 1 C g c H H S U 9 4 y C W j I G a a E C q 5 3 h X T C Z G E g m 4 z p 0 t Y / h T / T 9 q l o m U W r W u z 0 L h Y 1 J F F R + g Y n S I L V V E D X a E m a i G K 7 t E j e k Y v x o P x Z L w a b / N o x l j M H K I f M D 6 + A L q 8 n Y 0 = < / l a t e x i t > ✓ ⇤ < l a t e x i t s h a 1 _ b a s e 6 4 = " n M x v k X 5 o x 3 I w M 1 Z d / / y I b m S I 7 p U = " > A A A C G n i c d V D L S g M x F L 1 T 3 / V V d e k m K I K o l J m 2 Y L s T 3 b h U s F r o j C W T p m 1 o 5 k F y R y h D v 8 O N C 3 / E j Q t F 3 I k b / 8 Z M a 0 F F D w Q O 5 5 y b 3 B w / l k K j b X 9 Y u a n p m d m 5 + Y X 8 4 t L y y m p h b f 1 S R 4 l i v M 4 i G a m G T z W X I u R 1 F C h 5 I 1 a c B r 7 k V 3 7 / J P O v b r j S I g o v c B B z L 6 D d U H Q E o 2 i k V s F J 3 d E l T d X 1 v d Q u 1 g 6 r p V r t I C N 2 u V b O S L V S K l e G L v Y 4 0 u u 9 Y a u w P Y m R S Y x M Y s Q p 2 i N s H 2 2 5 + / c A c N Y q v L n t i C U B D 5 F J q n X T s W P 0 U q p Q M M m H e T f R P K a s T 7 u 8 a W h I A 6 6 9 d L T V k O w Y p U 0 6 k T I n R D J S v 0 + k N N B 6 E P g m G V D s 6 d 9 e J v 7 l N R P s V L 1 U h H G C P G T j h z q J J B i R r C f S F o o z l A N D K F P C 7 E p Y j y r K 0 L S Z N y V M f k r + J 5 e l o m M X n X P T x j G M M Q + b s A W 7 4 M A h H M E p n E E d G N z C A z z B s 3 V n P V o v 1 u s 4 m r O + Z j b g B 6 z 3 T 8 0 7 n x Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " C O H T B S E 8 D o / y l W n L O w F U t q t v 3 5 c = " > A A A C G n i c d V D L S g M x F M 3 4 r P V V d e k m t A i i U m b a Q t t d 0 Y 3 L C v Y B n b F k 0 r Q N z T x I 7 g h l 6 F 8 I b v w V N y 4 U c S d u + j d m W g s q e i B w O O f c 5 O a 4 o e A K T H N q L C 2 v r K 6 t p z b S m 1 v b O 7 u Z v f 2 m C i J J W Y M G I p B t l y g m u M 8 a w E G w d i g Z 8 V z B W u 7 o I v F b t 0 w q H v j X M A 6 Z 4 5 G B z / u c E t B S N 2 P F 9 u y S j h y 4 T m z m q + V K o V o 9 S 4 h Z r B Y T U i k V i q W J D U M G 5 O Z k 0 s 3 k F j G 8 i O F F D F t 5 c 4 Z c L W u f 3 k 1 r 4 3 o 3 8 2 7 3 A h p 5 z A c q i F I d y w z B i Y k E T g W b p O 1 I s Z D Q E R m w j q Y + 8 Z h y 4 t l W E 3 y k l R 7 u B 1 I f H / B M / T 4 R E 0 + p s e f q p E d g q H 5 7 i f i X 1 4 m g X 3 F i 7 o c R M J / O H + p H A k O A k 5 5 w j 0 t G Q Y w 1 I V R y v S u m Q y I J B d 1 m W p e w + C n + n z Q L e c v M W 1 e 6 j X M 0 R w o d o i w 6 R h Y q o x q 6 R H X U Q B T d o 0 f 0 j F 6 M B + P J e D X e 5 t E l 4 2 v m A P 2 A 8 f E J 1 q W g n A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " C O H T B S E 8 D o / y l W n L O w F U t q t v 3 5 c = " > A A A C G n i c d V D L S g M x F M 3 4 r P V V d e k m t A i i U m b a Q t t d 0 Y 3 L C v Y B n b F k 0 r Q N z T x I 7 g h l 6 F 8 I b v w V N y 4 U c S d u + j d m W g s q e i B w O O f c 5 O a 4 o e A K T H N q L C 2 v r K 6 t p z b S m 1 v b O 7 u Z v f 2 m C i J J W Y M G I p B t l y g m u M 8 a w E G w d i g Z 8 V z B W u 7 o I v F b t 0 w q H v j X M A 6 Z 4 5 G B z / u c E t B S N 2 P F 9 u y S j h y 4 T m z m q + V K o V o 9 S 4 h Z r B Y T U i k V i q W J D U M G 5 O Z k 0 s 3 k F j G 8 i O F F D F t 5 c 4 Z c L W u f 3 k 1 r 4 3 o 3 8 2 7 3 A h p 5 z A c q i F I d y w z B i Y k E T g W b p O 1 I s Z D Q E R m w j q Y + 8 Z h y 4 t l W E 3 y k l R 7 u B 1 I f H / B M / T 4 R E 0 + p s e f q p E d g q H 5 7 i f i X 1 4 m g X 3 F i 7 o c R M J / O H + p H A k O A k 5 5 w j 0 t G Q Y w 1 I V R y v S u m Q y I J B d 1 m W p e w + C n + n z Q L e c v M W 1 e 6 j X M 0 R w o d o i w 6 R h Y q o x q 6 R H X U Q B T d o 0 f 0 j F 6 M B + P J e D X e 5 t E l 4 2 v m A P 2 A 8 f E J 1 q W g n A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 L h + C L V 6 b L W C O A m X Y G P x h A O x Z H 0 = " > A A A C G n i c d V D L S g M x F M 3 U V 6 2 v q k s 3 w S K I S J l p C 2 1 3 R T c u K 9 g H t G P J p G k b m n m Q 3 B H K M N / h x l 9 x 4 0 I R d + L G v z E z b U F F D w Q O 5 5 y b 3 B w n E F y B a X 4 a m Z X V t f W N 7 G Z u a 3 t n d y + / f 9 B W f i g p a 1 F f + L L r E M U E 9 1 g L O A j W D S Q j r i N Y x 5 l e J n 7 n j k n F f e 8 G Z g G z X T L 2 + I h T A l o a 5 K 2 o n 1 7 S k 2 P H j s x i v V o r 1 e v n C T H L 9 X J C a p V S u R L 3 Y c K A 3 J 7 F g 3 x h G c P L G F 7 G s F U 0 U x T Q A s 1 B / r 0 / 9 G n o M g + o I E r 1 L D M A O y I S O B U s z v V D x Q J C p 2 T M e p p 6 x G X K j t K t Y n y i l S E e + V I f D 3 C q f p + I i K v U z H V 0 0 i U w U b + 9 R P z L 6 4 U w q t k R 9 4 I Q m E f n D 4 1 C g c H H S U 9 4 y C W j I G a a E C q 5 3 h X T C Z G E g m 4 z p 0 t Y / h T / T 9 q l o m U W r W u z 0 L h Y 1 J F F R + g Y n S I L V V E D X a E m a i G K 7 t E j e k Y v x o P x Z L w a b / N o x l j M H K I f M D 6 + A L q 8 n Y 0 = < / l a t e x i t > N ✏ A,✓ ⇤ < l a t e x i t s h a 1 _ b a s e 6 4 = " z E 2 i F O V U 4 w h f E T k n W j 2 U J Q Z n j i 8 = " > A A A C A n i c b V A 9 S w N B E J 3 z M 8 a v q J X Y H I o g K u H O R s u o j Z V E M I m Q x L C 3 m Z g l e 3 v H 7 p w Q j q C F P 0 U b C 0 V s / R V 2 / h B 7 N x + F X w 8 G H u / N M D M v i K U w 5 H k f z t j 4 x O T U d G Y m O z s 3 v 7 C Y W 1 o u m y j R H E s 8 k p G + C J h B K R S W S J D E i 1 g j C w O J l a B z 3 P c r 1 6 i N i N Q 5 d W O s h + x K i Z b g j K z U y K 2 e X q Y 1 j I 2 Q k e o 1 0 s P d G r W R 2 O V 2 r 5 H b 8 P L e A O 5 f 4 o / I R m H n 8 / 4 W A I q N 3 H u t G f E k R E V c M m O q v h d T P W W a B J f Y y 9 Y S g z H j H X a F V U s V C 9 H U 0 8 E L P X f T K k 2 3 F W l b i t y B + n 0 i Z a E x 3 T C w n S G j t v n t 9 c X / v G p C r Y N 6 K l S c E C o + X N R K p E u R 2 8 / D b Q q N n G T X E s a 1 s L e 6 v M 0 0 4 2 R T y 9 o Q / N 8 v / y X l v b z v 5 f 0 z m 8 Y R D J G B N V i H L f B h H w p w A k U o A Y c b e I A n e H b u n E f n x X k d t o 4 5 o 5 k V + A H n 7 Q u 4 B 5 n d < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " W x N f m b Q C H q j H w X B 6 l z a w 2 B p m + z k = " > A A A C A n i c b V C 7 S g N B F J 3 1 G e N r 1 U p s F o M g K m H X R s u o j Z V E M A / I i 9 n J T T J k d n a Z u S u E J d j Y + R v a W C h i 6 1 f Y + S H 2 T h 6 F J h 6 4 c D j n X u 6 9 x 4 8 E 1 + i 6 X 9 b M 7 N z 8 w m J q K b 2 8 s r q 2 b m 9 s F n U Y K w Y F F o p Q l X 2 q Q X A J B e Q o o B w p o I E v o O R 3 L w Z + 6 R a U 5 q G 8 w V 4 E t Y C 2 J W 9 x R t F I D X v 7 q p 5 U I d J c h L L f S M 6 O q t g B p P W D f s P O u F l 3 C G e a e G O S y R 1 + P z 5 A J s o 3 7 M 9 q M 2 R x A B K Z o F p X P D f C W k I V c i a g n 6 7 G G i L K u r Q N F U M l D U D X k u E L f W f P K E 2 n F S p T E p 2 h + n s i o Y H W v c A 3 n Q H F j p 7 0 B u J / X i X G 1 m k t 4 T K K E S Q b L W r F w s H Q G e T h N L k C h q J n C G W K m 1 s d 1 q G K M j S p p U 0 I 3 u T L 0 6 R 4 n P X c r H d t 0 j g n I 6 T I D t k l + 8 Q j J y R H L k m e F A g j d + S J v J B X 6 9 5 6 t t 6 s 9 1 H r j D W e 2 S J / Y H 3 8 A B I y m u A = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " W x N f m b Q C H q j H w X B 6 l z a w 2 B p m + z k = " > A A A C A n i c b V C 7 S g N B F J 3 1 G e N r 1 U p s F o M g K m H X R s u o j Z V E M A / I i 9 n J T T J k d n a Z u S u E J d j Y + R v a W C h i 6 1 f Y + S H 2 T h 6 F J h 6 4 c D j n X u 6 9 x 4 8 E 1 + i 6 X 9 b M 7 N z 8 w m J q K b 2 8 s r q 2 b m 9 s F n U Y K w Y F F o p Q l X 2 q Q X A J B e Q o o B w p o I E v o O R 3 L w Z + 6 R a U 5 q G 8 w V 4 E t Y C 2 J W 9 x R t F I D X v 7 q p 5 U I d J c h L L f S M 6 O q t g B p P W D f s P O u F l 3 C G e a e G O S y R 1 + P z 5 A J s o 3 7 M 9 q M 2 R x A B K Z o F p X P D f C W k I V c i a g n 6 7 G G i L K u r Q N F U M l D U D X k u E L f W f P K E 2 n F S p T E p 2 h + n s i o Y H W v c A 3 n Q H F j p 7 0 B u J / X i X G 1 m k t 4 T K K E S Q b L W r F w s H Q G e T h N L k C h q J n C G W K m 1 s d 1 q G K M j S p p U 0 I 3 u T L 0 6 R 4 n P X c r H d t 0 j g n I 6 T I D t k l + 8 Q j J y R H L k m e F A g j d + S J v J B X 6 9 5 6 t t 6 s 9 1 H r j D W e 2 S J / Y H 3 8 A B I y m u A = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " U z n a I S 8 Z f V F C N G l N Z u 2 H A S 8 / R T k = " > A A A C A n i c b V B N S 8 N A E N 3 U r 1 q / o p 7 E S 7 A I I l I S L 3 q s e v E k F e w H N G n Z b K f t 0 s 0 m 7 E 6 E E o o X / 4 o X D 4 p 4 9 V d 4 8 9 + 4 / T h o 9 c H A 4 7 0 Z Z u a F i e A a X f f L y i 0 s L i 2 v 5 F c L a + s b m 1 v 2 9 k 5 N x 6 l i U G W x i F U j p B o E l 1 B F j g I a i Q I a h Q L q 4 e B q 7 N f v Q W k e y z s c J h B E t C d 5 l z O K R m r b e z e t z I d E c x H L U T u 7 O P G x D 0 h b x 6 O 2 X X R L 7 g T O X + L N S J H M U G n b n 3 4 n Z m k E E p m g W j c 9 N 8 E g o w o 5 E z A q + K m G h L I B 7 U H T U E k j 0 E E 2 e W H k H B q l 4 3 R j Z U q i M 1 F / T m Q 0 0 n o Y h a Y z o t j X 8 9 5 Y / M 9 r p t g 9 D z I u k x R B s u m i b i o c j J 1 x H k 6 H K 2 A o h o Z Q p r i 5 1 W F 9 q i h D k 1 r B h O D N v / y X 1 E 5 L n l v y b t 1 i + X I W R 5 7 s k w N y R D x y R s r k m l R I l T D y Q J 7 I C 3 m 1 H q 1 n 6 8 1 6 n 7 b m r N n M L v k F 6 + M b U m u X W Q = = < / l a t e x i t > Let us characterize the landscape around θ ˚from the perspective of a fixed environment e P E. We define the set N e,θ ˚to be the largest path-connected region of space containing both θ ˚and the set tθ P Θ s.t.|L e pθq ´Le pθ ˚q| ď u, with ą 0. In other words, if θ P N e,θ ˚then there exist a path-connected region in parameter space including θ ˚and θ where each parameter also is in N e,θ ˚and its loss on environment e is comparable. From the perspective of environment e, all these points are equivalent to θ ˚. We would like to evaluate the elements of this set with respect to a different environment e 1 ‰ e. We will say that e 1 is consistent with e in θ ˚if max θPN e,θ ˚|L e 1 pθq ´Le pθq ˇˇis small. Repeating this reasoning for all environment pairs, we arrive at the following inconsistency score: I pθ ˚q :" max pe,e 1 qPE 2 max θPN e,θ ˚|L e 1 pθq ´Le pθ ˚q|. (1) After computing the gradients for a dataset of n ´1 examples, if an n-th example appeared, we would just compute one more vector of gradients and add it to the sum. A Gaussian Process (Rasmussen, 2003) for example would require recomputing the entire solution from scratch, as all interactions are considered. The same reasoning holds for SGD in the finite-sum optimization case L " 1 m ř m i"1 Li, where gradients from a mini-batch are seen as unbiased estimators of gradients from the pooled loss. (Bottou et al., 2018) . Assume that L has L-Lipschitz gradients (i.e. curvature bounded from above by L). Then, at any point θ, we can construct the upper bound Lθpθq " Lp θq `∇Lp θq J pθ ´θq `L}θ ´θ} 2 {2. Loosely speaking, a sum is large if any of the summands is large, a product is large if all factors are large. This consistency is our formalization of the principle "good explanations are hard to vary". Finally, we can write down an invariant learning consistency score for A: ILCpA, p θ 0 q :" ´Eθ 0 "ppθ 0 q " I pA 8 pθ 0 , Eq ‰ . (2) That is, the learning consistency of an algorithm measures the expected consistency across environments of the minimizer it converges to on the pooled data. Example: low consistency of a classic patchwork solution. One-hidden-layer networks with sigmoid activations and enough neurons can approximate any function f ˚: r0, 1s Ñ R (Cybenko, 1989) . In appendix A.1 we show how the construction used to obtain the weights leads to a maximally inconsistent solution according to I pθ ˚q, which would not be expected to generalize o.o.d. Here we draw a connection between our definition of inconsistency and the local geometric properties of the loss landscapes. For the sake of clarity, we consider two environments (A and B) and assume θ ˚to be a local minimizer (with zero loss) for both environments. Using a Taylor approximationfoot_0 , we get Lpθq « 1 2 pθ ´θ˚qJ H A`B pθ ´θ˚q for }θ ´θ˚} « 0, where H A`B " pH A `HB q {2 is the arithmetic mean of the Hessians H A :" ∇ 2 L A pθ ˚q and H B :" ∇ 2 L A pθ ˚q. H A`B does not capture the possibly conflicting geometries of landscape A or B: It performs a "logical OR" on the dominant eigendirections. In contrast, the geometric mean, or Karcher mean, H A^B (Ando et al., 2004) is affected by the inconsistencies between landscapes: It performs a "logical AND". In appendix A.2, we give a formal definition of H A^B , and show that for diagonal Hessians, I pθ ˚q ď 2 p detpH A`B q detpH A^B q q 2 . As for the geometric mean of positive numbers, 0 ď detpH A^B q ď detpH A`B q; thus, inconsistency is lowest when shapes of A and B are similar -exactly as in the bottom-left minimizer of Fig. 1 .

2.2. ILC AS A LOGICAL AND BETWEEN LANDSCAPES

-1.5 -1 -0.5 0 0.5 1 1.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 H A , H B H A+B H A∧B From Hessians to gradients. We just saw that the consistency of θ ˚is linked to the geometric mean of the Hessians tH e pθ ˚qu ePE . Under the simplifying assumption that each H e is diagonalfoot_1 and all eigenvalues λ e i are positive, their geometric mean is H ^:" diagpp ś ePE λ e 1 q 1 {|E| , . . . , p ś ePE λ e n q 1 {|E| q. The curvature of the corresponding loss in the i-th eigendirection depends on how consistent the curvatures of each environment are in that direction. Consider now optimizing from a point θ k ; gradient descent reads θ k`1 " θ k ´ηH `pθ k ´θ˚q , where H `:" diagp 1 |E| ř ePE λ e 1 , . . . , 1 |E| ř ePE λ e n q. For η small enough 7 , we have |θ k`1 i ´θi | " p1 ´η 1 |E| ř ePE λ e i q|θ k i ´θi |. As noted, this choice maximises the speed of convergence to θ ˚, but does not take into account whether this minimizer is consistent. We can reduce the speed of convergence on directions where landscapes have different curvatures -which would lead to a high inconsistency -by following the gradients from the geometric mean of the landscapes, as opposed to the arithmetic mean. I.e, we substitute the full gradient ∇Lpθq " H `pθ k ´θ˚q with ∇L ^pθq " H ^pθ k ´θ˚q . Also, we have thatfoot_3 ∇L ^pθq " p ś ePE ∇L e pθqq 1 {|E| : to reduce the speed of convergence in directions with inconsistency, we can take the element-wise geometric mean of gradients from different environments (see also Fig. 11 in the appendix).

2.3. MASKING GRADIENTS WITH A LOGICAL AND

The element-wise geometric mean of gradients, instead of the arithmetic mean, increases consistency in the convex quadratic case. However, there are a few practical limitations: (i) The geometric mean is only defined when all the signs are consistent. It is still to be defined how sign inconsistencies, which can occur in non-convex settings, should be dealt with. (ii) It provides little flexibility for 'partial' agreement: Even a single zero gradient component in one environment stops optimization in that direction. (iii) For numerical stability, it needs to be computed in log domain (more computationally expensive). (iv) Adaptive step-size schemes (e.g. Adam (Kingma and Ba, 2015) ) rescale the signal componentwise for local curvature adaptation. The exact magnitude of the geometric mean would be ignored and most of the difference from arithmetic averaging will come from the zero-ed components. (i) can be overcome by treating different signs as zeros, resulting in a geometric mean of 0 if there is any sign disagreement across environments for a gradient component. For (ii) we can allow for some disagreement (with a hyperparameter), by not masking out if there is a large percentage of environments with gradients in that direction. (iii) and (iv) can be addressed together: Since the final magnitude will be rescaled except for masked components, i.e. where the geometric mean is 0, we can use the average gradients (fast to compute) and mask out the components based on the sign agreement (computable avoiding the log domain). The AND-mask. We translate the reasoning we just presented to a practical algorithm that we will refer to as the AND-mask. In its most simple implementation, we zero out those gradient components with respect to weights that have inconsistent signs across environments. Formally, the masked gradients at iteration k are m t pθ k q d ∇Lpθ k q, where m t pθ k q vanishes for any component where there are less than t P td{2, d{2 `1, . . . , du agreeing gradient signs across environments (d is the number of environments in the batch), and is equal to one otherwise. For convenience, our implementation of the AND-mask uses a threshold τ P r0, 1s as hyper-parameter instead of t, such that t " d 2 pτ `1q. Mathematically, for every component rm τ s j of m τ , rm τ s j " 1 rτ d ď | ř e signpr∇L e s j q|s. Computing the AND-mask has the same time and space complexity of standard gradient descent, i.e., linear in the number of examples that we average. Due to its simplicity and computational efficiency, this is the algorithm that we will use in the experiment section. As a first result, we show that following the AND-masked gradient leads to convergence in the directions made visible by the AND-mask. The proof is presented in appendix A.3. Proposition 1. Let L have L-Lipschitz gradients and consider a learning rate η ď 1{L. After k iterations, AND-masked GD visits at least once a point θ where }m t pθq d ∇Lpθq} 2 ď Op1{kq. Behaviour in the face of randomness. Here we put the AND mask through a theoretical test: For gradients coming from different environments that are inconsistent (or even random), how fast does the AND mask reduce the magnitude of the step taken in parameter space, compared to standard GD? In case of inconsistency, the AND mask should quickly make the gradient steps more conservative. To assess this property, we consider a fixed set of n parameters θ and gradients ∇L e drawn independently from a multivariate Gaussian with zero mean and unit covariance. Proposition 2. Consider the setting we just outlined, with L " p1{dq ř d e"1 L e . While E}∇Lpθq} 2 " Opn{dq, we have that @t P td{2 `1, . . . , du, Dc P p1, 2s such that E}m t pθq d ∇Lpθq} 2 ď Opn{c d q. The proof is presented in Appendix A.4, and an illustration with numerical verification in Fig. 4 (the magnitudes of masked gradients (•) for more than 100 examples were always zero in the numerical verification). Intuitively, in the presence of purely random patterns, the AND-mask has a desirable property: it decreases the strength of these signals exponentially fast, as opposed to linearly.

3. EXPERIMENTS

Real-world datasets are generated by (causal) generative processes which share mechanisms (Pearl, 2009) . However, mechanisms and spurious signals are often entangled, making it hard to assess what part of the learning signal is due to either. As the goal of this paper is to dissect these two components to understand how they ultimately contribute to the learning process, we create a simple synthetic dataset that allows us to control the complexity, intensity, and number of shortcuts in the data. After that, we evaluate whether spurious signals can be detected even in high-dimensional networks and datasets by testing the AND-mask on a memorization task similar to the one proposed in Zhang et al. (2017) , and on a behavioral cloning task using the game CoinRun (Cobbe et al., 2019a) . Environment A Environment B Pooled A & B Test o.o.d. d S < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > d M < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > d S < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > d M < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > d S < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > d M < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > d S < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z G d e v u G t b + D G Y S h a x N F H L H 7 a C A Q = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g x 4 r t R / Q h r L Z b N q l m 0 3 Y n Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I J X C o O t + O a W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U M U m m G W + z R C a 6 F 1 D D p V C 8 j Q I l 7 6 W a 0 z i Q v B t M b u Z + 9 5 F r I x L 1 g N O U + z E d K R E J R t F K r X D Y G l Z r b t 1 d g P w l X k F q U K A 5 r H 4 O w o R l M V f I J D W m 7 7 k p + j n V K J j k s 8 o g M z y l b E J H v G + p o j E 3 f r 4 4 d U b O r B K S K N G 2 F J K F + n M i p 7 E x 0 z i w n T H F s V n 1 5 u J / X j / D 6 N r P h U o z 5 I o t F 0 W Z J J i Q + d 8 k F J o z l F N L K N P C 3 k r Y m G r K 0 K Z T s S F 4 q y / / J Z 2 L u u f W v f v L W u O 2 i K M M J 3 A K 5 + D B F T T g D p r Q B g Y j e I I X e H W k 8 + y 8 O e / L 1 p J T z B z D L z g f 3 y J B j b M = < / l a t e x i t > d M < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 0 i X x F 8 1 m A u E g t A g V u Q W 5 A x L U 3 4 = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P B g 1 6 E i t Y W 2 l A 2 m 0 m 7 d L M J u x u h h P 4 E L x 4 U 8 e o v 8 u a / c d v m o K 0 P B h 7 v z T A z L 0 g F 1 8 Z 1 v 5 3 S y u r a + k Z 5 s 7 K 1 v b O 7 V 9 0 / e N R J p h i 2 W C I S 1 Q m o R s E l t g w 3 A j u p Q h o H A t v B 6 G r q t 5 9 Q a Z 7 I B z N O 0 Y / p Q P K I M 2 q s d B / 2 b / v V m l t 3 Z y D L x C t I D Q o 0 + 9 W v X p i w L E Z p m K B a d z 0 3 N X 5 O l e F M 4 K T S y z S m l I 3 o A L u W S h q j 9 v P Z q R N y Y p W Q R I m y J Q 2 Z q b 8 n c h p r P Y 4 D 2 x l T M 9 S L 3 l T 8 z + t m J r r 0 c y 7 T z K B k 8 0 V R J o h J y P R v E n K F z I i x J Z Q p b m 8 l b E g V Z c a m U 7 E h e I s v L 5 P H s 7 r n 1 r 2 7 8 1 r j u o i j D E d w D K f g w Q U 0 4 A a a 0 A I G A 3 i G V 3 h z h P P i v D s f 8 9 a S U 8 w c w h 8 4 n z 8 Z K Y 2 t < / l a t e x i t > Figure 5 : A 4-dimensional instantiation of the synthetic memorization dataset for visualization. Every example is a dot in both circles, and it can be classified by finding either of the "oracle" decision boundaries shown.

3.1. THE SYNTHETIC MEMORIZATION DATASET

We introduce a binary classification task. The input dimensionality is d " d M `dS . While ppy|x d M q is the same across all environments (i.e. the mechanism), ppy|x d S , eq is not the same across all environments (the shortcuts). While the mechanism is shared, it needs a highly non-linear decision boundary to classify the data. The shortcuts are not shared across environments, but provide a simple way to classify the data, even when pooling all the environments together. See Figure 5 Despite the apparent simplicity of this dataset, note that it is challenging to find the invariant mechanism. In high dimensions, even with tens of pooled environments, the shortcuts allow for a simple classification rule under almost every classical definition of 'simple': the boundary is linear, it has a large margin, it can be expressed with small weights, it is fast to learn, robust to input noise, and has perfect accuracy and no i.i.d. generalization gap. Finding the complex decision boundary of the spirals, instead, is a fiddly process and arguably a much slower path towards small loss. Baselines. We evaluate several domain-agnostic baselines (all multilayer perceptrons) with some of the most common regularizers used in deep learning -Dropout, L1, L2, Batch normalization. We also consider methods that explicitly make use of the environment labels, namely: (i) Domain Adversarial Neural Networks (DANN) (Ganin et al., 2016) , a method specifically designed to address domain adaptation by obfuscating domain information with an adversarial classifier; (ii) Invariant Risk Minimization (IRM) (Arjovsky et al., 2019) , discussed in detail in appendix B. The AND-mask is trained with the same configurations in Table 1 . Results. Fig. 6 shows training and test accuracy. DANN fails because it can align the representationlayer distributions from different environments using only shortcuts, such that they become indistinguishable to the domain-discriminating classifier. The AND-mask was the only method to achieve perfect test accuracy, by fitting the spirals instead of the shortcuts. In particular, the combination of the AND-mask with L1 or L2 regularization gave the most robust results overall, as they help suppress neurons that at initialization are tuned towards the shortcuts. Correlations between average, memorization and generalization gradients. Due to the synthetic nature of the dataset, we can intervene on its data-generating process in order to examine the learning signals coming from the mechanisms and from the shortcuts. We isolate the two and measure their contribution to the average gradients, as we vary the agreement threshold of the mask. More precisely, we look at the gradients computed with respect to the weights of a randomly initialized network for different sets of data: (i) The original data, with mechanisms and shortcuts. (ii) Randomly permuting the dataset over the mechanisms dimensions, thus leaving the "memorization" signal of the shortcuts. (iii) Randomly permuting over the shortcuts dimensions, isolating the "generalization" signal of the mechanisms alone. Figure 7 shows the correlation be-tween the components of the original average gradient (i) and the shortcut gradients ((ii), dashed line), and between the original average gradients and the mechanism gradients ((iii), solid line). While the signal from the mechanisms is present in the original average gradients (i.e. ρ « 0.4 for τ " 0), its magnitude is smaller and it is 'drowned' by the memorization signal. Instead, increasing the threshold of the AND-mask (right side) suppresses memorization gradients due to the shortcuts, and for τ « 1 most of the gradient components remaining contain signal from the mechanism. On the left side, we test the other side of our hypothesis: An XORmask zeroes out consistent gradients, preserves those with different signs, and results in a sharper decrease of the correlation with the mechanism gradients.

3.2. EXPERIMENTS ON CIFAR-10

Memorization in a vision task. Zhang et al. (2017) showed that neural networks trained with standard regularizers -like L2 and Dropout -can still memorize large training datasets with shuffled labels, i.e. reaching «100% training accuracy. Their experiments raised significant questions about the generalization properties of neural networks and the role of regularizers in constraining the hypothesis class. Our hypothesis is that ILC -for example implemented as the AND-mask -should prevent memorization on a similar task with the shuffled labels, as gradients will tend to largely 'disagree' in the absence of a shared mechanism. However, when the labels are not shuffled, ILC should have a much weaker effect, as real shared mechanisms are still present in the data. To test our hypothesis, we ran an experiment that closely resembles the one in (Zhang et al., 2017) on CIFAR-10. We trained a ResNet on CIFAR-10 with random labels, with and without the AND-mask. In all experiments we used batch size 80, and treated each example as its own "environment". Recall that standard gradient averaging is equivalent to an AND-mask with threshold 0. As shown in Figure 8 , the ResNet with standard average gradients memorized the data, while slightly increasing the threshold for the AND-mask quickly prevented memorization (dark blue line). In contrast, training the same networks on the dataset with the original labels resulted in both of them converging and generalizing to the test set, confirming that the mask did not significantly affect the generalization error with a general underlying mechanism in the data. Note that there is no standard notion of environments in CIFAR-10, which is why we treated every example as coming from its own environment. This assumption is not unreasonable, as every image in the dataset was literally collected in a different physical environment. If anything, it is the standard i.i.d. assumption that hides this variety behind a notion of a single distribution encompassing all environments. The results of this experiment further support this interpretation, and can serve as evidence that -in some cases -we might be able to identify invariances even without an explicit partition into environments, as this can be already identified at the level of individual examples. Label noise. Following up on this experiment, we test how the AND-mask performs in the presence of label noise, i.e. when a portion of the labels in the training set are randomly shuffled (25% here). According to our hypothesis, gradients computed on examples with random labels should disagree and get masked out by the AND-mask, while signal from correctly labeled data should contribute to update the model. As shown in Figure 9 , the performance on the incorrectly labeled portion of the dataset is well below chance for the AND-mask (as it predicts correctly despite the wrong labels), while the baseline again memorizes the incorrect labels. On the test set (with untouched labels), the baseline peaks early then decreases as the model overfits, while the AND-mask slowly but steadily improves. Published as a conference paper at ICLR 2021

3.3. BEHAVIORAL CLONING ON COINRUN

CoinRun (Cobbe et al., 2019b ) is a game introduced to test how RL agents generalize to novel situations. The agent needs to collect coins, jumping on top of walls and boxes and avoiding enemies.foot_4 Each level is procedurally generated -i.e. it has a different combination of sprites, background, and layout -but the physics and goals are invariant. Cobbe et al. (2019b) showed that state-of-the-art RL algorithms fail to model these invariant mechanisms, performing poorly on new levels unless trained on thousands of them. To test our hypothesis, we set up a behavioral cloning task using CoinRun. 10 We start by pre-training a strong policy π ˚using standard PPO (Schulman et al., 2017) for 400M steps on the full distribution of levels. We then generate a dataset of pairs ps, π ˚pa|sqq from the on-policy distribution. The training data consists of 1000 states from each of 64 levels, while test data comes from 2000 levels. A ResNet-18 πθ is then trained to minimize the loss D KL pπ ˚||π θ q on the training set. We compare the generalization performance of regular Adam to a version that uses the AND-mask. For each method we ran an automatic hyperparameter optimization study using Tree-structured Parzen Estimation (Bergstra et al., 2013) 

4. RELATED WORK

Generalization and covariate shift. The classic formulation of statistical learning theory (Vapnik) concerns learning from independent and identically distributed samples. The case where the distribution of the covariates at test time differs from the one observed during training is termed covariate shift (Sugiyama et al., 2007; Quionero-Candela et al., 2009; Sugiyama and Kawanabe, 2012) . Standard solutions involve re-weighting of the training examples, but require the additional assumption of overlapping supports for train and test distributions. Causal models and invariances. As we mentioned in the Introduction, causality provides a strong motivation for our work, based on the notion that statistical dependencies are epiphenomena of an underlying causal model (Pearl, 2009; Peters et al., 2017) . The causal description identifies stable elements -e.g. physical mechanisms -connecting causes and effects, which are expected to remain invariant under interventions or changing external conditions (Haavelmo, 1943; Schölkopf et al., 2012) ). This motivates our notion of invariant mechanisms, and inspired related notions which have been proposed for robust regression (Rojas-Carulla et al., 2018; Heinze-Deml et al., 2018; Arjovsky et al., 2019; Hermann and Lampinen, 2020; Ahuja et al., 2020; Krueger et al., 2020) . We discuss this in more detail in appendix C.1. Domain generalization. ILC can be used in a setting of domain generalization (Muandet et al., 2013) , but it is not limited to it: as demonstrated in the experiments in Section 3.2, the AND-mask can be applied even if domain labels are not available. In contrast, by treating every example as a single domain, methods relying on domain classifiers (like DANN Ganin et al. (2016) or Balaji et al. (2018) ) would require as many output units as there are training examples (i.e. 50'000 for CIFAR-10). Gradient agreement. Looking at gradient agreement to learn meaningful representations in neural networks has been explored in (Du et al., 2018; Eshratifar et al., 2018; Fort et al., 2019; Zhang et al., 2019b) . These approaches mainly rely on a measure of cosine similarity between gradients, which we did not consider here for two main reasons: (i) It is a 'global' property of the gradients, and it would not allow us to extract precise information about different patterns in the network; (ii) It is unclear how to extend it beyond pairs of vectors, and for pairwise interactions its computational cost scales quadratic in the number of examples used.

5. CONCLUSIONS

Generalizing out of distribution is one of the most significant open challenges in machine learning, and relying on invariances across environments or examples may be key in certain contexts. In this paper we analyzed how neural networks trained by averaging gradients across examples might converge to solutions that ignore the invariances, especially if these are harder to learn than spurious patterns. We argued that if learning signals are collected on one example at the time -as it is the case for gradients, e.g., computed with backpropagation -the way these signals are aggregated can play a significant role in the patterns that will ultimately be expressed: Averaging gradients in particular can be too permissive, acting as a logical OR of a collection of distinct patterns, and lead to a 'patchwork' solution. We introduced and formalized the concept of Invariant Learning Consistency, and showed how to learn invariances even in the face of alternative explanations that -although spurious -fulfill most characteristics of a good solution. The AND-mask is but one of multiple possible ways to improve consistency, and it is unlikely to be a practical algorithm for all applications. However, we believe this should not distract from the general idea which we are trying to put forward -namely, that it is worthwhile to study learning of explanations that are hard to vary, with the longer term goal of advancing our understanding of learning, memorization and generalization. A APPENDIX TO SECTION 2 A.1 A CLASSIC EXAMPLE OF A PATCHWORK SOLUTION Consider a neural network with one hidden layer consisting of two neurons and sigmoidal activations: f θ pxq " θ 5 σpθ 1 x `θ2 q `θ6 σpθ 3 x `θ4 q, σpzq :" 1{p1 `e´z q. (3) We want to learn the continuous function f ˚: r0, 1s Ñ r0, 2s defined as f ˚pxq " $ ' ' ' ' ' & ' ' ' ' ' % x P r0, 0.4q; 10px ´0.4q x P r0.4, 0.5q; 1 x P r0.5, 0.7q; 10px ´0.7q `1 x P r0.7, 0.8q; 2 x P r0.8, 1s. To perform this task, we have access to (noiseless) data from two environments: A : tpx, f pxqq | x P r0, 0.5qu, B : tpx, f pxqq | x P r0.5, 1su. There is a simple constructive way, provided by the universal function approximation theorem Cybenko (1989) to fit this functionfoot_6 using f θ up to an arbitrarily small mean squared error L A`B pθ ˚q. Leaving out the details of such a construction (Cybenko (1989) for details), the reader can check on the left panel of Figure 10 that θ ˚" p100, ´50, 100, ´75, 1, 1q provides a good fit for both environments A and B -both L A pθ ˚q and L B pθ ˚q are small. θ " θ˚" p100, ´45, 100, ´75, 1, ´0.5q However, it is easy to realize that θ ˚while being a solution which can be returned by gradient descent using the pooled data A+B -is not consistent (formal definition given in the main paper in Section 2). Indeed, it is possible to modify θ˚s uch that the loss in environment A remains almost unchanged, while the loss in environment B gets larger. In particular, on the right panel of Figure 10 , we show that θ˚" p100, ´50, 100, ´75, 1, ´0.5q is such that L A pθ ˚q ď L A p θ˚q ` (with very small) but L B pθ ˚q ! L B p θ˚q . According to our definition in Equation 1 (see main paper), we have I pθ ˚q ď |L B pθ ˚q ´LB p θ˚q | -that is a large number (low consistency). Remark 1 (Connection to out of distribution generalization). The main point of this analysis was to show an example of where our measure of consistency behaves according to expectations: A typical implementation of the universal approximation theorem -which one would not expect to generalize out of distribution, due to its 'patchwork' behavior -leads indeed to a very low consistency score. A.2 SECTION 2.2: CONSISTENCY AS ARITHMETIC/GEOMETRIC MEAN OF LANDSCAPES Geometric mean of matrices. Given an n-tuple of d ˆd positive definite matrices pA j q n j"1 , the geometric (Karcher) mean Ando et al. (2004) is the unique positive definite solution X to the equation ř m i"1 logpA ´1 i Xq " 0, where log is the matrix logarithm. This matrix average has many desirable properties, which make it relevant to signal processing and medical imaging. The Karcher mean can also be written as arg min XPS ``pdq f pXq " 1 2m ř m i"1 dpA i , Xq 2 , where d is the Riemannian distance in the manifold of SPD matrices S ``pdq. Link between consistency and geometric means. Here we show how the consistency score introduced in Equation 1 can be linked (in a simplified setting) to a comparison between the arithmetic and geometric means of the Hessians approximating the landscapes of two separate environments A and B. At the local minimizer θ ˚" 0, we assume that L A " L B " 0 and consider the local quadratic approximations L A pθq " 1 2 θ J H A θ and L B pθq " 1 2 θ J H B θ. Here, we make the additional simplifying assumption that H A and H B are diagonal (or, more broadly, co-diagonalizable): H A " diagpλ A 1 , ¨¨¨, λ A n q, H B " diagpλ B 1 , ¨¨¨, λ B n q , with λ A i ě 0 and λ B i ě 0 for all i " 1, . . . , n. The arithmetic and geometric means (noted as H A`B and H A^B ) of these matrices are defined in this simplified setting as follows: H A`B " diag ˆ1 2 pλ A 1 `λB 1 q, ¨¨¨, 1 2 pλ A n `λB n q ˙, H A^B " diag ˆbλ A 1 λ B 1 , ¨¨¨, b λ A n λ B n ˙. As motivated in the main paper and in Figure 12 , one can link the consistency of two landscapes to a comparison between the geometric and arithmetic means of the corresponding Hessians. Proposition 3. In the setting we just described, the consistency score in Equation 1 can be estimated as follows: Before showing the proof, we note that the proposition gives a lower bound on the consistency. That is, it provides a pessimistic estimate. Yet, as we motivated, this estimate has a nice geometric interpretation. However, as we outline in a remark after the proof, this estimate is tight in two important limit cases. I pθ ˚q ď 2 ˆdetpH A`B q detpH A^B q ˙2 . Proof. In this setting, Equation 1gives I pθ ˚q :" max " max L A pθqď L B pθq, max L B pθqď L A pθq * . Recall that L A pθq " 1 2 θ J H A θ " 1 2 ÿ i λ A i θ 2 i . Hence, this is a simple quadratic program with quadratic constraints, and max L A pθqď L B pθq " max 1 2 ř i λ A i θ 2 i ď 1 2 ÿ i λ B i θ 2 i . Further, we can change variables and introduce θi " θ i a λ A i {2. The problem gets even simpler: max L A pθqď L B pθq " max } θ} 2 ď ÿ i λ B i λ A i θ2 i " ¨max i λ B i λ A i . All in all, we get I pθ ˚q " max " max i λ B i λ A i , max i λ A i λ B i * " ¨max i max " λ B i λ A i , λ A i λ B i * ď ¨max i ˆλB i λ A i `λA i λ B i " ¨max i " pλ B i q 2 `pλ A i q 2 λ B i λ A i * ď ¨max i " pλ B i `λA i q 2 λ B i λ A i * . This means a I pθ ˚q ď max i λ B i `λA i a λ B i λ A i " 2 max i pλ B i `λA i q{2 a λ B i λ A i ď 2 ś i pλ B i `λA i q{2 ś i a λ B i λ A i " 2 detpH A`B q detpH A^B q , where the first inequality comes from the monotonicity of the square root function, and the second inequality comes from the fact that (i) the geometric mean is always smaller or equal than the arithmetic mean and (ii) for any sequence of numbers α i ą 1, max i α i ď ś i α i . Remark 2 (Sanity check). There are two important cases where we can test the bound above. First, if H A " H B , then I pθ ˚q " , and the bound returns I pθ ˚q ď 2 , since the geometric and arithmetic mean are the same. Next, say λ A i " 0 but λ B i ą 0; then, both the bound and the inconsistency score are 8 (highest possible inconsistency).

A.3 PROOF OF PROPOSITION 1

In this appendix section we consider the AND-masked GD algorithm, introduced at the end of Section 2. We recall that the masked gradients at iteration k are m t pθ k q d ∇Lpθ k q, where m t pθ k q vanishes for any component where there are less than t P td{2 `1, . . . , du agreeing gradient signs across environments, and is equal to one otherwise. In a full-batch setting, the algorithm is θ k`1 " θ k ´η m t pθ k q d ∇Lpθ k q, (AND-masked GD) where η ą 0 is the learning rate. Proposition 1. Let L have L-Lipschitz gradients and consider a learning rate η ď 1{L. After k iterations, AND-masked GD visits at least once a point θ where }m t pθq d ∇Lpθq} 2 ď Op1{kq. Proof. Thanks to the component-wise L-smoothness and using a Taylor expansion around θ i we have Lpθ i`1 q ď Lpθ i q ´ηx∇Lpθ i q, m t pθ i q d ∇Lpθ i qy `Lη 2 2 }m t pθ i q d ∇Lpθ i q} 2 " Lpθ i q ´ˆη ´Lη 2 2 ˙}m t pθ i q d ∇Lpθ i q} 2 . If we seek η ´Lη 2 {2 ě η{2, then η ď 1 L , as we assumed in the proposition statement. Therefore, Lpθ i`1 q ď Lpθ i q ´pη{2q}m t pθ i q d ∇Lpθ i q} 2 , for all i ě 0. Summing over i from 0 to a desired iteration k, we get k´1 ÿ i"0 pη{2q}m t pθ i q d ∇Lpθ i q} 2 ď Lpθ 0 q ´Lpθ k q ď Lpθ 0 q. Therefore, min i"0,...,k }m t pθ i q d ∇Lpθ i q} 2 ď 1 k k´1 ÿ i"0 pη{2q}m t pθ i q d ∇Lpθ i q} 2 ď 2Lpθ 0 q ηk . Hence, there exist an iteration i ˚P t0, . . . , ku such that }m t pθ i ˚q d ∇Lpθ i ˚q} 2 ď Op1{kq.

A.4 PROOF OF PROPOSITION 2

Here we fix parameters θ P R n and assume gradients ∇L e pθq P R n coming from environments e P E are drawn independently from a multivariate Gaussian with zero mean and σ 2 I covariance. We want to show that, in this random setting, the AND-mask introduced in Section 2.3 decreases the magnitude of the gradient step. Proposition 2. Consider the setting we just outlined, with L " p1{dq ř d e"1 L e . While E}∇Lpθq} 2 " Opn{dq, we have that @t P td{2 `1, . . . , du, Dc P p1, 2s such that E}m t pθq d ∇Lpθq} 2 ď Opn{c d q. Proof. Let us drop the argument θ for ease of notation. First, let us consider ∇L (no gradient AND-mask): E › › › › › 1 d d ÿ i"1 ∇L ei › › › › › 2 " 1 d 2 d ÿ i"1 E}∇L ei } 2 " nσ 2 d , where in the first equality we used the fact that the ∇L ei are uncorrelated and in the second the fact that Er}∇L ei } 2 s is the trace of the covariance of ∇L ei . all positive, the value of a gradient in one environment cannot influence the value of the gradient in another one. We remark that conditional independence on the right-hand side is therefore not an assumption, but is intrinsic to the upper bound. Putting it all together, we have E › › › › › m t d ˜1 d d ÿ i"1 ∇L ei ¸› › › › › 2 ď 2n d ÿ pj "t E » - ˜1 d d ÿ i"1 r∇L ei s j ¸2 ˇˇˇp j " d fi fl ˆ1 2 ˙d ˆd pj ď 2n d ÿ pj "t σ 2 ˆ1 2 ˙d ˆd pj ď σ 2 npd ´tq ˆd t ˙ˆ1 2 ˙d´1 , where in the second line we bounded the squared average of a sum of half normal distributions: let tX i u d i"1 be a family of uncorrelated positive half-normal distributions derived from a Gaussians with mean zero and variance σ 2 , we have 12 that ErX i s " σ a 2{π and ErX 2 i s " σ 2 . Also, ErX i X j s " ErX i sErX j s ď σ 2 . Therefore, E » - ˜1 d d ÿ i"1 X i ¸2fi fl " 1 d 2 d ÿ i,j"1 ErX i X j s ď σ 2 . Finally, if we set r " t{d P p0.5, 1s, we have 13 ˆd t ˙" ˆ1 r r p1 ´rq 1´r ˙d as d Ñ 8 (discarding all polynomial terms). Hence `d t ˘is of the form q d , with 1 ď q ă 2. So, the quantity σ 2 npd ´tq `d t ˘`1 2 ˘d´1 will be exponentially decreasing at a rate Opn{p2 ´qq d q. Notably, if t " d{2, then we lose the exponential rate and get back to Opn{dq. regularizers L1 and L2 are never combined; instead, one weight regularization type out of L1, L2 and none is selected and we sample from the respective range afterwards. The parameters found to work best from the grid search were: agreement threshold of 1, 256 hidden units, 3 hidden layers, batch size 128, Adam with learning rate 1e-2, no batch norm, no dropout, L2-regularization with a coefficient of 1e-4, no L1-regularization. In practice, we often found it helpful to rescale the gradients after masking to compensate for the decreasing overall magnitude. We add the option for gradient rescaling as an additional hyperparameter, as we found it to help in several experiments. It rescales gradient components layer-wise after masking, by multiplying the remaining gradient components by c, where c is the ratio of the number of components in that layer over the number of non-masked components in that layer (i.e. the sum of the binary elements in the mask). 14 . We speculate that for very large layers, a less extreme normalization scheme or the additional use of gradient clipping might be appropriate.

B.3.2 DOMAIN ADVERSARIAL NEURAL NETWORKS

The experiments using DANN follow a similar pattern. The model consists of an embedding network, a classification network, and a "domain discrimination" network. All three modules are two-layer multi-layer perceptrons (MLP). The number of hidden units of all MLPs are sampled from the range specified in Table 1 , and we trained 100 models. Both label classifier and domain discriminator are applied to the output of the embedding network. The label classifier is trained to minimize the cross-entropy-loss between the predicted and the true label. Similarly, the domain discriminator is trained to minimize the loss between predicted and true domain-label. The embedding network is trained to minimize the regular task classification loss and at the same time to maximize the the domain-loss achieved by the domain discriminator.

B.3.3 INVARIANT RISK MINIMIZATION

For the experiments using IRM we used the authors' PyTorch implementation from https: //github.com/facebookresearch/InvariantRiskMinimization. We perform a random hyperparameter search over with the ranges shown in In Figure 14 we show the learning curves of training and test accuracy for the different methods.

B.3.5 CORRELATION PLOTS

For the correlation plots in Figure 7 we used a randomly initialized MLP with the following configuration: 3 hidden layers, 256 hidden units. The dataset was using 16 environments and batches of size 1024. The lines in Figure 7 are linear least-squares regressions to the gradient data shown as scatter plots. We repeat the experiment 10 times with different network weight seeds, resulting in the 10 regression lines. Zero gradients are excluded from the regression computation, as most gradients are masked out by the product mask in both cases. t5e´4, 1e´3, 5e´3u and a learning rate decay at epoch 60. The baseline worked best with a learning rate of 1e´3, while the AND-mask with 5e´3, likely to compensate for the masked out gradients. The AND-mask threshold that worked best was 0.2, which is consistent with the results obtain in the experiment above. B.6 SECTION 3.3: BEHAVIORAL CLONING ON COINRUN The target policy π ˚is obtained by training PPO (Schulman et al., 2017) for 400M time steps using the codefoot_9 for the paper Cobbe et al. (2020) . This policy is trained on the full distribution of levels in order to maximize its generality. We use π ˚to generate a behavioral cloning (BC) dataset, consisting of pairs ps, π ˚pa|sqq, where s are the input-images (64 ˆ64 RGB) and π ˚pa|sq is the discrete probability distribution over actions output by π A ResNet-18 πθ is trained to minimize the loss D KL pπ ˚||π θ q. We ran two automatic hyperparameter optimization studies using Tree-structured Parzen Estimation (TPE) (Bergstra et al., 2013) of 1024 trials each, with and without the AND-mask. The learning rate was decayed by a factor of 10 half-way at at 3 {4 of the training epochs. The "temporal" version of the AND-mask used for this experiment is reported in Algorithm 1. Algorithm 1: Temporal AND-mask Adam m Ð β 1 ¨m `p1 ´β1 q ¨g v Ð β 2 ¨v `p1 ´β2 q ¨pg ˝gq a Ð β 3 ¨a `p1 ´β3 q ¨elemwise_signpgq b Ð 1r|a| ě τ s θ Ð θ ´αpm ˝bq m ? v ` In blue we highlight the additional lines compared to traditional Adam. The threshold τ and β 3 are hyperparameters that we included in the 1'024 trials of the search using Tree-structured Parsen Estimators. For the top 10 runs, hyperparameter values that were selected via the TPE search for the AND-mask are the following. We found that applying weight decay as a second independent update after the AND-mask routine improved performance. To keep the comparison fair, we added this as a switch in the hyperparameter search for the Adam baseline as well, and it improved performance there as well. Causal graphs and causal factorizations The formalization of causality through directed acyclic graphs (Pearl, 2009) is a key element informing our exposition. According to such formalization, a causal model gives rise to each observed distribution. It is thereby possible to exploit properties of the causal factorization of the joint probability distribution over the observed variables. Clearly, there are many ways to factorize a joint distribution into conditionals; a distinguishing feature of the causal factorization is that many of the conditionals, which we can think of as physical mechanisms underlying the statistical dependencies represented, are expected to remain invariant under interventions or changing external conditions. This postulate has appeared in various forms in the literature (Haavelmo, 1943; Simon, 1953; Hurwicz, 1962; Pearl, 2009; Schölkopf et al., 2012) .foot_10  Causal models and robust regression Based on this insight, it was proposed that regression based on causal features should presents desirable invariance and robustness properties (Mooij et al., 2009; Schölkopf et al., 2012; Peters et al., 2016; Rojas-Carulla et al., 2018; Heinze-Deml et al., 2018; von Kügelgen et al., 2019; Parascandolo et al., 2018) . In this view, the mechanisms can be considered as features of the patterns such that they support stable conditional probabilities. Thus learning the mechanisms may help achieve a stable performance across a number of conditions. Other works connecting causality and learning through invariances are (Subbaswamy et al., 2019; Heinze-Deml and Meinshausen, 2017) , and perhaps -most related to our work - (Arjovsky et al., 2019) : we presented a comparison with this method in the following section. Causal regularization Recently (Janzing, 2019) showed that biasing learning towards models of lower complexity might in some cases be beneficial for a notion of generalization from observational to interventional regimes. Our proposed solution is however different, in that we only indirectly deal with penalizing model complexity, and rather focus on our proposed notion of consistency.

C.2 LEARNING INVARIANCES IN THE DATA

Here we are going to compare ILC to other approaches for learning invariances in the data with neural networks, and in particular to Invariant Risk Minimization (IRM) Arjovsky et al. (2019) . The authors of IRM analyze a set up where minimizing training error might lead to models which absorb all the correlations found within the training data, thus failing to recover the relevant causal explanation. They consider a multi-environment setting and focus on the objective of extracting data representations that lead to invariant prediction across environments. While the high level objective is close to the one we focused on, the differences become clear when considering the definition of invariant predictors presented in Arjovsky et al. (2019) : Definition 1. A data representation Φ : X Ñ H elicits an invariant predictor w ˝Φ across environments E if there is a classifier w : H Ñ Y simultaneously optimal for all environments, i.e., w P arg min w:HÑy R e p w ˝Φq @e P E. In particular, the objective minimized by IRM is: where Φ are the logits predicted by the neural network and w is a dummy scaling variable (see Arjovsky et al. (2019) ). The relevant part is the penalty term λ ¨› › ∇ w|w"1.0 R e pw ¨Φq › › 2 : One way to interpret it, is that the penalty is large on every environment where the distribution outputted by Φ could be made 'closer' to the distribution of the labels by either sharpening (w ą 1) or softening it (i.e., closer to uniform w ă 1). Let us consider the example from IRM, where the authors describe two datasets of images that each contain either a cow or a camel: In one of the datasets, there is grass on 80% of the images with cows, while in the other dataset there is grass on 90% of them. IRM then makes the point that we can learn to ignore grass as a feature, because its correlation with the label cow is inconsistent (80% vs 90%). The setting we consider in this paper is slightly different: take our example from the CIFAR-10 experiments. Under our concept of invariance, we expect that (depending on the data generating process) even a single dataset where we treat every image as coming from its own 'environment' should be sufficient to discover invariances. Drawing a connection to the setting from IRM, we would argue that the second dataset should not be necessary to learn that 'grass' is not 'cow'. If one treats every example as coming from its own environment, there is already sufficient information in the first dataset to realize that cows are not grass: Grass is predictive of cows only in 80% of the data, so grass cannot be 'cow'. The actual cow on the other hand, should be present in 100% of the images, and as such it is the invariance we are looking for. Note that this is of course a much more strict definition of invariance: If our dataset contains images labeled as 'cows' but that have no cows within them, we might start to discard the features of cows as well.



This provides a useful simplified perspective. Indeed, this quadratic model is heavily used in the optimization community (see e.g.Jastrzębski et al. (2017);Zhang et al. (2019a);Mandt et al. (2017).) It was shown in(Becker et al., 1988) and recently in(Adolphs et al., 2019;Singh and Alistarh, 2020) that neural networks have a strong diagonal dominance of the Hessian matrix at the end of training. Smaller than 1{λmax, λmax is the maximum eigenvalue of Hessians from different environments, This holds if θ ´θ˚i s positive, otherwise we have ∇L ^pθq " ´`ś ePE |∇Lepθq| ˘1{|E| . See Figure 17 in appendix B.6 for a visualization of the game. To obtain a robust evaluation, we preferred to approach behavioral cloning instead of the full RL problem, as it is a standard supervised learning task and has substantially fewer moving parts than most deep RL algorithms. For a graphical description, the reader can check http://neuralnetworksanddeeplearning. com/chap4.html Therefore, c is 1 if the AND-mask has only 1s, and infinite if all components are masked out (which we then keep as 0.) https://github.com/pytorch/ignite/blob/master/examples/contrib/ cifar10/fastresnet.py https://github.com/openai/train-procgen This would be different for a non-causal factorization of the joint distribution, seeSchölkopf (2019)



Figure 1: Loss landscapes of a two-parameter model. Averaging gradients forgoes information that can identify patterns shared across different environments.

Figure 2: Inconsistency in gradient directions.

Figure3: Plotted are contour lines θ J H ´1θ " 1 for HA " diagp0.05, 1q and HB " diagp1, 0.05q. HA^B retains the original volumes, while for HA`B it is 5ˆbigger. This magnification shows inconsistency of A and B.

Figure 4: Magnitude of gradient (average or masked) on random data (|θ| = 3000, t " 0.8d).

Figure 6: Results on the synthetic dataset.

Figure 7: Gradient correlations.

Figure 8: As the AND-mask threshold increases, memorization on CIFAR-10 with random labels is quickly hindered.

Figure 9: The AND-mask prevents overfitting to the incorrectly labeled portion of the training set (left) without hurting the test accuracy (right).

Figure 10: Performance of the neural network in Equation 3 for two different parameters. Any reasonable modification on θ6 (say ˘1) leaves the performance on environment A unchanged, while the performance on environment B quickly degrades.

Figure 11: While the arithmetic mean of the two loss surfaces on the left is identical in all three cases (third column), the geometric mean has weaker and weaker gradients (black arrow) the more inconsistent the two loss surfaces become.

Figure12: Plotted are contour lines θ J H ´1θ " 1 for HA " diagp0.01, 1q and HB " diagp1, 0.01q. It is convenient to provide this visualization because it is linked to the matrix determinant: Volptθ J H ´1θ " 1uq " π a detpHq. The geometric average retains the volume of the original ellipses, while the volume of HA`B is 25 times bigger. This magnification indicates that landscape A is not consistent with landscape B.

sampled randomly from trajectories generated by π ˚. In order to test for generalization performance, the BC training dataset is restricted to 64 distinct levels. We generate 1000 examples per training level. The test set consists of 2000 examples, each from a different level which does not appear in the training set.

Figure 17: Screenshots of 6 levels of CoinRun (from OpenAI).

Figure 18: Learning curves for the behavioral cloning experiment on CoinRun. Training loss is shown on the left, test loss is shown on the right. We show the mean over the top-10 runs for each method. The shaded regions correspond to the 95% confidence interval of the mean based on bootstrapping.

`λ ¨› › ∇ w|w"1.0 R e pw ¨Φq ›

for a concrete example with d M and d S equal to 2, and two environments (A and B). The spirals (on d M ) are invariant but hard to model. The shortcuts (on d S ) are simple blobs but different in every environment: in A, linearly separable through a vertical decision boundary, in B with a horizontal one. If the two environments are pooled, a new diagonal decision boundary emerges on the shortcut dimensions as the most 'natural' one. While this perfectly classifies data in both environments A and B, critically it would have not been found by training on either partition A or B alone. The out-of-distribution (o.o.d.) test data has the same mechanism but random shortcuts. Therefore, any method relying exclusively on the shortcuts will have chance-level o.o.d. performance. Details about the dataset, baselines, and training curves are reported in appendix B.

of 1024 trials.Despite the theoretical computational efficiency of computing the ANDmask as presented in Section 2.3 (i.e., linear time and memory in the size of the mini-batch, just like classic SGD), current deep learning frameworks like PyTorch(Paszke et al., 2017) have optimized routines that sum gradients across examples in a mini-batch before it is possible to efficiently compute the AND-mask. We therefore test the AND-mask in a slightly different way.In training, in each iteration we sample a batch of data from a randomly chosen level out of the 64 available (and cycle through them all once per epoch). We then apply the AND-mask 'temporally', only allowing gradients that are consistent across time (and therefore across levels). See Algorithm 1 in appendix B.6 for a detailed description of this alternative formulation of the AND-mask. The figure shows the minimum test loss for the 10 best runs, supporting the hypothesis that the AND-mask helps identify invariant mechanisms across different levels.

Hyperparameter ranges for IRM.

Hyperparameters for the 5 best runs using the AND-mask, from the TPE search.

ACKNOWLEDGMENTS

We wish to thank Sebastian Gomez, Luca Biggio, Julius von Kügelgen, Paolo Penna, Ioannis Anagno, Ricards Marcinkevics, Sidak Pal Singh, Damien Teney for feedback on the manuscript, and thank Nando de Freitas for fruitful discussions in the early stage of this project. We also thank the Max Planck ETH Center for Learning Systems for supporting Giambattista Parascandolo, and the International Max Planck Research School for Intelligent Systems for supporting Alexander Neitz.

annex

Next, assume we apply the element-wise AND-mask m t to the gradients, which puts to zero the components (dimensions) where there are less than t P td{2, . . . , du equal signs. Since Gaussians are symmetric around zero, the probability of having exactly u positive j-th gradient component among d environments is P rpp j " uq " `1 2 ˘d `d u ˘. Hence, the probability to keep the j-th gradient direction (considering also negative consistency) isWe would now like to compute E. The difficulty lies in the fact that the event m t " 1 makes gradients conditionally dependent. Indeed, conditioning on both m t " 1 and r∇L e s j ą 0 changes the distribution of r∇L e 1 s j : this gradient entry is going to be more likely to be positive or negative, depending on the value of r∇L e s j and on the details of the gradient mask. To solve the issue, we our strategy is to reduce the discussion (without loss in generality and with no additional assumption) to the case where gradient entries have all the same sign and hence conditional independence is restored.We consider the following writing for the quantity we are interested in:where we used the definition of 2-norm, the law of total expectation, and the symmetry of the problem with respect to positive and negative numbers. Finally, since the gradient components within the same environment are conditionally independent, for any j P t1, . . . , nu we can writeFinally, we note that the following bound holds:Indeed, if all environments lead to positive (or, symmetrically, negative) and non-interacting gradients in the j-th direction, the average will be the biggest in norm. Moreover -crucially -conditioned on the event p j " d, gradients coming from different environments are distributed as a positive half-normal distributions. Moreover, they are conditionally independent; this because, since they are B APPENDIX TO SECTION 3We used Pytorch Paszke et al. (2017) to implement all experiments in this paper.Our codebase is publicly available at https://github.com/gibipara92/ learning-explanations-hard-to-vary.B.1 SECTION 3.1 Here we report more technical details about the synthetic dataset described in Section 3. Each example is constructed as follows: we first choose the label randomly to be either `1 or ´1, with equal probability. The example is a vector with d S `dM entries, consisting of the shortcut and the mechanism. In our experiments, d M " 2 and d S " 32.The Gaussian shortcuts are obtained by first sampling one random vector x s P R d S per environment. Its components x s,i are sampled independently from a Normal distribution: x s,i " N p0, 0.1q. We use x s for class 1, and ´xs for class -1. In the test set, all shortcut components are sampled i.i.d. from the same Normal distribution. Effectively, each example of the test set belongs to a different domain. The mechanism is implemented as the two interconnected spirals shown in Figure 13 by sampling the radius r " Unifp0.08, 1.0q and then computing the angle as α " 2πnr where n is the number of revolutions of the spiral. We add uniform noise in the range r´0.02, 0.02s to the radii afterwards. 

B.3 EXPERIMENT

We train all networks for t3000{Du epochs, dropping the learning rate by a factor 10 halfway through, and again at three-quarters of training. For computational reason, we stop each trial before completion if the training accuracy exceeds 97% and the test accuracy is below 60%. All networks are MLPs with LeakyReLU activation functions and a cross-entropy loss on the output. We run a hyperparameter search over the ranges shown in Table 1 . For IRM and the AND-mask, we select the best-performing run and re-run it 50 times with different random seeds. For DANN and the standard baselines nothing produced results significantly better than chance.

B.3.1 STANDARD REGULARIZERS AND AND-MASK

The networks with the L1, L2, Dropout and Batch-normalization regularizers, have hyperparameters that were randomly selected from Table 1 . For the AND-mask we used the very same ranges. The 

B.4 FURTHER VISUALIZATIONS AND EXPERIMENTS

In Figure 15 we show how many environments need to be present for the baseline without AND-mask to switch the decision boundary from the shortcuts to the mechanism. Under the same experimental condition as in the main paper, the baseline first succeeds at 1024 environments. Memorization experiment In Figure 16 , we report the test performance (dashed lines) corresponding to the curves presented in the main paper for the CIFAR-10 memorization experiment. The test performance with standard labels decreases slower than the training performance as the threshold increases, and they eventually reach the same value. This is consistent with the hypothesis that by training on the consistent directions, the AND-mask selects the invariant patterns and prunes out the signals that are not invariant.Network architecture and training details Each trial trains the ResNet "FastResNet" from the PyTorch-Ignite example 15 for 80 epochs on the full CIFAR-10 training set. We use the Adam optimizer with a learning rate of 5e´4, and a 0.1 learning rate decay at epoch 40 and 60. We fix the batch size to 80. We set up 14 trials by evaluating each of the AND-mask-thresholds t0, 0.05, 0.1, 0.2, 0.4, 0.6, 0.8u for two datasets: (a) unchanged CIFAR-10, (b) CIFAR-10 with the training labels replaced by random labels. Note that a threshold of 0 corresponds to not using the AND-mask. Each trial is run twice with separate random seeds.Label noise experiment We trained the same ResNet as for the experiment above, once with and once without the AND-mask. We ran each experiment with three different starting learning rates

