CURI: A BENCHMARK FOR PRODUCTIVE CONCEPT LEARNING UNDER UNCERTAINTY

Abstract

Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts ("a scene with objects that have the same color") and ad-hoc categories defined through goals ("objects that could fall on one's head"). In contrast, standard classification benchmarks: 1) consider only a fixed set of category labels, 2) do not evaluate compositional concept learning and 3) do not explicitly capture a notion of reasoning under uncertainty. We introduce a new few-shot, meta-learning benchmark, Compositional Reasoning Under Uncertainty (CURI) to bridge this gap. CURI evaluates different aspects of productive and systematic generalization, including abstract understandings of disentangling, productive generalization, learning boolean operations, variable binding, etc. Importantly, it also defines a model-independent "compositionality gap" to evaluate difficulty of generalizing out-of-distribution along each of these axes. Extensive evaluations across a range of modeling choices spanning different modalities (image, schemas, and sounds), splits, privileged auxiliary concept information, and choices of negatives reveal substantial scope for modeling advances on the proposed task. All code and datasets will be available online.

1. INTRODUCTION

Human concept learning is more flexible than today's AI systems. Human conceptual knowledge is productive: people can understand and generate novel concepts via compositions of existing concepts ("an apartment dog") (Murphy, 2002) , unlike standard machine classifiers that are limited to a fixed set of classes ("dog", "cat", etc.). Further, humans can induce goal-based, "ad hoc" categories such as "things to take from one's apartment in a fire" (children, dogs, keepsakes, etc.) (Barsalou, 1983) . Thus, unlike AI systems, humans reason seamlessly in large, essentially "unbounded" concept spaces. Beyond unboundedness, a natural challenge in such concept spaces is uncertainty -the right concept to be inferred is uncertain, as a plethora of candidate concepts could explain observations. For e.g. in Figure 1 (top, image panel), the "right" concept could be that "All objects are blue and have the same size", but it could also be "There are less than four objects in the scene", or "All objects have the same color". Humans gracefully handle such uncertainty and underdetermination (Tenenbaum & Griffiths, 2001; Xu & Tenenbaum, 2007; Goodman et al., 2008; Piantadosi et al., 2016) . Popular compositional reasoning benchmarks such as CLEVR (Johnson & Zhang, 2016) for visual question answering and Ravens Progressive Matrices (Santoro et al., 2017) for deductive, analogical reasoning are compositionally rich and challenging in nature, but do not tackle ambiguity and underdetermination. We address this gap in the literature, and propose the Compositional Reasoning Under Uncertainty (CURI) benchmark to study how modern machine learning systems can learn concepts spanning a large, productively defined space (Figure 1 ). In pursuit of this goal, we instantiate a meta learning task where a model must acquire a compositional concept from finite samples. A signature of productivity in human thought is our ability to handle novel combinations of known, atomic components. Thus, in CURI we instantiate different systematic train-test splits to analyze different forms of generalization in concept learning, involving novel combinations of intrinsic properties (e.g. color, shape) with boolean operators, counting, extrinsic object properties (e.g. object location), and a novel test of variable binding in context of compositional learning. While related systematic splits have been proposed in prior work in context of other tasks such as question answering and analogical reasoning (Barrett et al., 2018; Hill et al., 2019; Agrawal et al., All objects are blue and have the same size All objects in the scene have the same color There exists a blue object in the scene and the rest of the objects are squares G < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 t l 7 x 1 s i + e 6 / Q W 6 O O c i / n b G f q K Y = " > A A A B 8 n i c b V D L S s N A F L 2 p r 1 p f V Z d u g k V w V R I R d F l 0 o c s K 9 g F t K J P p p B 0 6 m Q k z N 0 I J / Q w 3 L h R x 6 9 e 4 8 2 + c t F l o 6 4 G B w z n 3 M u e e M B H c o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N i r V l L W o E k p 3 Q 2 K Y 4 J K 1 k K N g 3 U Q z E o e C d c L J b e 5 3 n p g 2 X M l H n C Y s i M l I 8 o h T g l b q 9 W O C Y 0 p E d j c b V G t e 3 Z v D X S V + Q W p Q o D m o f v W H i q Y x k 0 g F M a b n e w k G G d H I q W C z S j 8 1 L C F 0 Q k a s Z 6 k k M T N B N o 8 8 c 8 + s M n Q j p e 2 T 6 M 7 V 3 x s Z i Y 2 Z x q G d z C O a Z S 8 X / / N 6 K U b X Q c Z l k i K T d P F R l A o X l Z v f 7 w 6 5 Z h T F 1 B J C N b d Z X T o m m l C 0 L V V s C f 7 y y a u k f V H 3 v b r / c F l r 3 B R 1 l O E E T u E c f L i C B t x D E 1 p A Q c E z v M K b g 8 6 L 8 + 5 8 L E Z L T r F z D H / g f P 4 A e W y R X Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 t l 7 x 1 s i + e 6 / Q W 6 O O c i / n b G f q K Y = " > A A A B 8 n i c b V D L S s N A F L 2 p r 1 p f V Z d u g k V w V R I R d F l 0 o c s K 9 g F t K J P p p B 0 6 m Q k z N 0 I J / Q w 3 L h R x 6 9 e 4 8 2 + c t F l o 6 4 G B w z n 3 M u e e M B H c o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N i r V l L W o E k p 3 Q 2 K Y 4 J K 1 k K N g 3 U Q z E o e C d c L J b e 5 3 n p g 2 X M l H n C Y s i M l I 8 o h T g l b q 9 W O C Y 0 p E d j c b V G t e 3 Z v D X S V + Q W p Q o D m o f v W H i q Y x k 0 g F M a b n e w k G G d H I q W C z S j 8 1 L C F 0 Q k a s Z 6 k k M T N B N o 8 8 c 8 + s M n Q j p e 2 T 6 M 7 V 3 x s Z i Y 2 Z x q G d z C O a Z S 8 X / / N 6 K U b X Q c Z l k i K T d P F R l A o X l Z v f 7 w 6 5 Z h T F 1 B J C N b d Z X T o m m l C 0 L V V s C f 7 y y a u k f V H 3 v b r / c F l r 3 B R 1 l O E E T u E c f L i C B t x D E 1 p A Q c E z v M K b g 8 6 L 8 + 5 8 L E Z L T r F z D H / g f P 4 A e W y R X Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 t l 7 x 1 s i + e 6 / Q W 6 O O c i / n b G f q K Y = " > A A A B 8 n i c b V D L S s N A F L 2 p r 1 p f V Z d u g k V w V R I R d F l 0 o c s K 9 g F t K J P p p B 0 6 m Q k z N 0 I J / Q w 3 L h R x 6 9 e 4 8 2 + c t F l o 6 4 G B w z n 3 M u e e M B H c o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N i r V l L W o E k p 3 Q 2 K Y 4 J K 1 k K N g 3 U Q z E o e C d c L J b e 5 3 n p g 2 X M l H n C Y s i M l I 8 o h T g l b q 9 W O C Y 0 p E d j c b V G t e 3 Z v D X S V + Q W p Q o D m o f v W H i q Y x k 0 g F M a b n e w k G G d H I q W C z S j 8 1 L C F 0 Q k a s Z 6 k k M T N B N o 8 8 c 8 + s M n Q j p e 2 T 6 M 7 V 3 x s Z i Y 2 Z x q G d z C O a Z S 8 X / / N 6 K U b X Q c Z l k i K T d P F R l A o X l Z v f 7 w 6 5 Z h T F 1 B J C N b d Z X T o m m l C 0 L V V s C f 7 y y a u k f V H 3 v b r / c F l r 3 B R 1 l O E E T u E c f L i C B t x D E 1 p A Q c E z v M K b g 8 6 L 8 + 5 8 L E Z L T r F z D H / g f P 4 A e W y R X Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 t l 7 x 1 s i + e 6 / Q W 6 O O c i / n b G f q K Y = " > A A A B 8 n i c b V D L S s N A F L 2 p r 1 p f V Z d u g k V w V R I R d F l 0 o c s K 9 g F t K J P p p B 0 6 m Q k z N 0 I J / Q w 3 L h R x 6 9 e 4 8 2 + c t F l o 6 4 G B w z n 3 M u e e M B H c o O d 9 O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N i r V l L W o E k p 3 Q 2 K Y 4 J K 1 k K N g 3 U Q z E o e C d c L J b e 5 3 n p g 2 X M l H n C Y s i M l I 8 o h T g l b q 9 W O C Y 0 p E d j c b V G t e 3 Z v D X S V + Q W p Q o D m o f v W H i q Y x k 0 g F M a b n e w k G G d H I q W C z S j 8 1 L C F 0 Q k a s Z 6 k k M T N B N o 8 8 c 8 + s M n Q j p e 2 T 6 M 7 V 3 x s Z i Y 2 Z x q G d z C O a Z S 8 X / / N 6 K U b X Q c Z l k i K T d P F R l A o X l Z v f 7 w 6 5 Z h T F 1 B J C N b d Z X T o m m l C 0 L V V s C f 7 y y a u k f V H 3 v b r / c F l r 3 B R 1 l O E E T u E c f L i C B t x D E 1 p A Q c E z v M K b g 8 6 L 8 + 5 8 L E Z L T r F z D H / g f P 4 A e W y R X Q = = < / l a t e x i t >

Variables

x Object in scene S All objects S {-x} S/{x} , < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > , < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > , < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0  x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G d w 8 R X u + A H 5 q w 2 I k k y G S S p A p R D U = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S Q i 6 L H o x W M F 2 4 p t K J v t p F 2 6 2 c T d i V B K / 4 U X D 4 p 4 9 d 9 4 8 9 + 4 b X P Q 1 g c D j / d m m J k X p l I Y 8 r x v p 7 C y u r a + U d w s b W 3 v 7 O 6 V 9 w + a J s k 0 x w Z P Z K L v Q 2 Z Q C o U N E i T x P t X I 4 l B i K x x e T / 3 W E 2 o j E n V H o x S D m P W V i A R n Z K W H D m n B V F / i Y 7 d c 8 a r e D O 4 y 8 X N S g R z 1 b v m r 0 0 t 4 F q M i L p k x b d 9 L K R g z T Y J L n J Q 6 m c G U 8 S H r Y 9 t S x W I 0 w X h 2 8 c Q 9 s U r P j R J t S 5 E 7 U 3 9 P j F l s z C g O b W f M a G A W v a n 4 n 9 f O K L o M x k K l G a H i 8 0 V R J l 1 K 3 O n 7 b k 9 o 5 C R H l j C u h b 3 V 5 Q O m G S c b U s m G 4 C + + v E y a Z 1 X f q / q 3 5 5 X a V R 5 H E Y 7 g G E 7 B h w u o w Q 3 U o Q E c F D z D K 7 w 5 x n l x 3 p 2 P e W v B y W c O 4 Q + c z x / U m Z E B < / l a t

Context Free Grammar

for-all x \in S (color?(x) = "blue") and (all (size?(S) = size?(x))) for-all x \in S (all (color?(x) = color?(S))) exists x \in S (color?(x) = "blue") and all (shape?(S {-x} ) = "square") : Figure 1 : Concept Space. Three example concepts (rows) along with schematic positive examples. Actual scenes are rendered in multiple ways including the CLEVR renderer (Johnson et al., 2016) (see Figure 2 ). Right: The grammar of variables, quantifiers, functions and operators to induce compositional concepts. 2017; Johnson et al., 2016; Vedantam et al., 2017; Higgins et al., 2017; Bakhtin et al., 2019; Lake & Baroni, 2018; Ruis et al., 2020) , ours is the first benchmark which tests different qualitative aspects of reasoning about productive concepts under uncertainty. Compositional Reasoning Under Uncertainty (CURI) Task. Concretely, the CURI task tests few-shot learning of relational concepts in a large compositional conceptual space, with design inspiration from studies in cognitive modeling using a language of thought (LOT) approach (Fodor, 1975; Piantadosi, 2011; Kemp et al., 2005) . CURI includes scene-based concepts such as "All objects have the same color" and "There exists a blue object while the rest are triangles" (Figure 1 ) but unlike CLEVR (Johnson et al., 2016) there are too few examples to deduce answers with certainty. Our benchmark is defined through a series of meta-learning episodes (see example in Figure 2 ): given positive and negative examples of a new concept D supp (known as the "support set"), the goal of an episode is to classify new examples D query (the "query set"). As in few-shot classification (Fei-Fei et al., 2006) , meta-learning (Vinyals et al., 2016) , and other open-set tasks (Lampert et al., 2014) , models are evaluated on novel classes outside the (meta-)training set. Unlike previous work (Triantafillou et al., 2019; Lake et al., 2019) that focuses on atomic concepts, our benchmarks concerns more structured, relational concepts built compositionally from a set of atomic concepts, and involves reasoning under uncertainty -an ideal learner must marginalize over many hypotheses when making predictions (Gelman et al., 2004; Xu & Tenenbaum, 2007; Piantadosi et al., 2016) . Novel Productive Concept: There exists a blue object in the scene, and the rest of the objects are all cylindrical in shape Model Predict D supp < l a t e x i t s h a 1 _ b a s e 6 4 = " a 7 2 o 8 y  6 R o o Q G v F S 2 q S J 5 T f o 1 6 Z k = " > A A A B + H i c b V B N S 8 N A E N 3 4 W e t H o x 6 9 L B b B U 0 l E 0 G N R D x 4 r 2 A 9 o Q 9 h s t + 3 S z S b s z o o 1 5 J d 4 8 a C I V 3 + K N / + N 2 z Y H b X 0 w 8 H h v h p l 5 U S q 4 B s / 7 d l Z W 1 9 Y 3 N k t b 5 e 2 d 3 b 2 K u 3 / Q 0 o l R l D V p I h L V i Y h m g k v W B A 6 C d V L F S B w J 1 o 7 G 1 1 O / / c C U 5 o m 8 h 0 n K g p g M J R 9 w S s B K o V u 5 C b M e s E f I t E n T P A / d q l f z Z s D L x C 9 I F R V o h O 5 X r 5 9 Q E z M J V B C t u 7 6 X Q p A R B Z w K l p d 7 R r O U 0 D E Z s q 6 l k s R M B 9 n s 8 B y f W K W P B 4 m y J Q H P 1 N 8 T G Y m 1 n s S R 7 Y w J j P S i N x X / 8 7 o G B p d B x m V q g E k 6 X z Q w A k O C p y n g P l e M g p h Y Q q j i 9 l Z M R 0 Q R C j a K u 3 / Q 0 o l R l D V p I h L V i Y h m g k v W B A 6 C d V L F S B w J 1 o 7 G 1 1 O / / c C U 5 o m 8 h 0 n K g p g M J R 9 w S s B K o V u 5 C b M e s E f I t E n T P A / d q l f z Z s D L x C 9 I F R V o h O 5 X r 5 9 Q E z M J V B C t u 7 6 X Q p A R B Z w K l p d 7 R r O U 0 D E Z s q 6 l k s R M B 9 n s 8 B y f W K W P B 4 m y J Q H P 1 N 8 T G Y m 1 n s S R 7 Y w J j P S i N x X / 8 7 o G B p d B x m V q g E k 6 X z Q w A k O C p y n g P l e M g p h Y Q q j i 9 l Z M R 0 Q R C j a K u 3 / Q 0 o l R l D V p I h L V i Y h m g k v W B A 6 C d V L F S B w J 1 o 7 G 1 1 O / / c C U 5 o m 8 h 0 n K g p g M J R 9 w S s B K o V u 5 C b M e s E f I t E n T P A / d q l f z Z s D L x C 9 I F R V o h O 5 X r 5 9 Q E z M J V B C t u 7 6 X Q p A R B Z w K l p d 7 R r O U 0 D E Z s q 6 l k s R M B 9 n s 8 B y f W K W P B 4 m y J Q H P 1 N 8 T G Y m 1 n s S R 7 Y w J j P S i N x X / 8 7 o G B p d B x m V q g E k 6 X z Q w A k O C p y n g P l e M g p h Y Q q j i 9 l Z M R 0 Q R C j a K u 3 / Q 0 o l R l D V p I h L V i Y h m g k v W B A 6 C d V L F S B w J 1 o 7 G 1 1 O / / c C U 5 o m 8 h 0 n K g p g M J R 9 w S s B K o V u 5 C b M e s E f I t E n T P A / d q l f z Z s D L x C 9 I F R V o h O 5 X r 5 9 Q E z M J V B C t u 7 6 X Q p A R B Z w K l p d 7 R r O U 0 D E Z s q 6 l k s R M B 9 n s 8 B y f W K W P B 4 m y J Q H P 1 N 8 T G Y m 1 n s S R 7 Y w J j P S i N x X / 8 7 o G B p d B x m V q g E k 6 X z Q w A k O C p y n g P l e M g p h Y Q q j i 9 l Z M R 0 Q R C j a r s g 3 B X 3 x 5 m b T O a r 5 X 8 + / O q / W r I o 4 S O k L H 6 B T 5 6 A L V 0 S 1 q o C a i y K B n 9 I r e n C f n x X l 3 P u a t K 0 4 x c 4 j + w P n 8 A a z G k 7 0 = < / l a t e x i t > u 2 D query < l a t e x i t s h a 1 _ b a s e 6 4 = " x P 2 Q i s 8 h 4 w g z r 7 9 s y 8 j k G J U h Q 4 Q = " > A A A C C H i c b V B N S 8 N A E N 3 4 W e t X 1 a M H F 4 v g q S Q i 6 L G o B 4 8 V b B W a E D b b i S 5 u N n F 3 I p a Q o x f / i h c P i n j 1 J 3 j z 3 7 i p P f j 1 Y O D x 3 g w z 8 6 J M C o O u + + F M T E 5 N z 8 z W 5 u r z C 4 t L y 4 2 V 1 Z 5 J c 8 2 h y 1 O Z 6 v O I G Z B C Q R c F S j j P N L A k k n A W X R 1 W / t k N a C N S d Y r D D I K E X S g R C 8 7 Q S m F j w 0 8 Y X k Z x k Z f U F 4 o e h Y W P c I v F d Q 5 6 W J Z h o + m 2 3 B H o X + K N S Z O M 0 Q k b 7 / 4 g 5 X k C C r l k x v Q 9 N 8 O g Y B o F l 1 D W / d x A x v g V u 4 C + p Y o l Y I J i 9 E h J t 6 w y o H G q b S m k I / X 7 R M E S Y 4 Z J Z D u r s 8 1 v r x L / 8 / o 5 x v t B I V S W I y j + t S j O J c W U V q n Q g d D A U Q 4 t Y V w L e y v l l 0 w z j j a 7 u g 3 B + / 3 y X 9 L b a X l u y z v Z b b Y P x n H U y D r Z J N v E I 3 u k T Y 5 J h 3 Q J J 3 f k g T y R Z + f e e X R e n N e v 1 g l n P L N G f s B 5 + w T O F 5 p 5 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x P 2 Q i s 8 h 4 w g z r 7 9 s y 8 j k G J U h Q 4 Q = " > A A A C C H i c b V B N S 8 N A E N 3 4 W e t X 1 a M H F 4 v g q S Q i 6 L G o B 4 8 V b B W a E D b b i S 5 u N n F 3 I p a Q o x f / i h c P i n j 1 J 3 j z 3 7 i p P f j 1 Y O D x 3 g w z 8 6 J M C o O u + + F M T E 5 N z 8 z W 5 u r z C 4 t L y 4 2 V 1 Z 5 J c 8 2 h y 1 O Z 6 v O I G Z B C Q R c F S j j P N L A k k n A W X R 1 W / t k N a C N S d Y r D D I K E X S g R C 8 7 Q S m F j w 0 8 Y X k Z x k Z f U F 4 o e h Y W P c I v F d Q 5 6 W J Z h o + m 2 3 B H o X + K N S Z O M 0 Q k b 7 / 4 g 5 X k C C r l k x v Q 9 N 8 O g Y B o F l 1 D W / d x A x v g V u 4 C + p Y o l Y I J i 9 E h J t 6 w y o H G q b S m k I / X 7 R M E S Y 4 Z J Z D u r s 8 1 v r x L / 8 / o 5 x v t B I V S W I y j + t S j O J c W U V q n Q g d D A U Q 4 t Y V w L e y v l l 0 w z j j a 7 u g 3 B + / 3 y X 9 L b a X l u y z v Z b b Y P x n H U y D r Z J N v E I 3 u k T Y 5 J h 3 Q J J 3 f k g T y R Z + f e e X R e n N e v 1 g l n P L N G f s B 5 + w T O F 5 p 5 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x P 2 Q i s 8 h 4 w g z r 7 9 s y 8 j k G J U h Q 4 Q = " > A A A C C H i c b V B N S 8 N A E N 3 4 W e t X 1 a M H F 4 v g q S Q i 6 L G o B 4 8 V b B W a E D b b i S 5 u N n F 3 I p a Q o x f / i h c P i n j 1 J 3 j z 3 7 i p P f j 1 Y O D x 3 g w z 8 6 J M C o O u + + F M T E 5 N z 8 z W 5 u r z C 4 t L y 4 2 V 1 Z 5 J c 8 2 h y 1 O Z 6 v O I G Z B C Q R c F S j j P N L A k k n A W X R 1 W / t k N a C N S d Y r D D I K E X S g R C 8 7 Q S m F j w 0 8 Y X k Z x k Z f U F 4 o e h Y W P c I v F d Q 5 6 W J Z h o + m 2 3 B H o X + K N S Z O M 0 Q k b 7 / 4 g 5 X k C C r l k x v Q 9 N 8 O g Y B o F l 1 D W / d x A x v g V u 4 C + p Y o l Y I J i 9 E h J t 6 w y o H G q b S m k I / X 7 R M E S Y 4 Z J Z D u r s 8 1 v r x L / 8 / o 5 x v t B I V S W I y j + t S j O J c W U V q n Q g d D A U Q 4 t Y V w L e y v l l 0 w z j j a 7 u g 3 B + / 3 y X 9 L b a X l u y z v Z b b Y P x n H U y D r Z J N v E I 3 u k T Y 5 J h 3 Q J J 3 f k g T y R Z + f e e X R e n N e v 1 g l n P L N G f s B 5 + w T O F 5 p 5 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x P 2 Q i s 8 h 4 w g z r 7 9 s y 8 j k We also vary the modality in which scenes are presented-rendering them as images, symbolic schemas, and sounds-enabling future research on modality-specific representational choices for compositional reasoning under uncertainty. Finally, we vary the concepts learned by the model during meta-training and meta-testing to test different aspects of systematic generalization. G J U h Q 4 Q = " > A A A C C H i c b V B N S 8 N A E N 3 4 W e t X 1 a M H F 4 v g q S Q i 6 L G o B 4 8 V b B W a E D b b i S 5 u N n F 3 I p a Q o x f / i h c P i n j 1 J 3 j z 3 7 i p P f j 1 Y O D x 3 g w z 8 6 J M C o O u + + F M T E 5 N z 8 z W 5 u r z C 4 t L y 4 2 V 1 Z 5 J c 8 2 h y 1 O Z 6 v O I G Z B C Q R c F S j j P N L A k k n A W X R 1 W / t k N a C N S d Y r D D I K E X S g R C 8 7 Q S m F j w 0 8 Y X k Z x k Z f U F 4 o e h Y W P c I v F d Q 5 6 W J Z h o + m 2 3 B H o X + K N S Z O M 0 Q k b 7 / 4 g 5 X k C C r l k x v Q 9 N 8 O g Y B o F l 1 D W / d x A x v g V u 4 C + p Y o l Y I J i 9 E h J t 6 w y o H G q b S m k I / X 7 R M E S Y 4 Z J Z D u r s 8 1 v r x L / 8 / o 5 x v t B I V S W I y j + t S j O J c W U V q n Q g d D A U Q 4 t Y V K c C Z x W e p n G l L I x H W L X U k l j 1 E E + P 3 Z K z q w y I F G i b E l D 5 u r v i Z z G W k / i 0 H b G 1 I z 0 s j c T / / O 6 m Y m u g 5 z L N D M o 2 W J R l A l i E j L 7 n A y 4 Q m b E x B L K F L e 3 E j a i i j J j 8 6 n Y E L z l l 1 d J 6 6 L u u X X v 4 b L W u C 3 i K M M J n M I 5 e H A F D b i H J v j A g M M z v M K b I 5 0 X 5 9 3 5 W L S W n G L m G P 7 A + f w B h g 2 N 0 w = = < / l a t e x i t > y = 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " e 1 D L B 6 e X V 4 C 8 G 2 Q 3 J 8 J V g 2 f x f T Y = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y H A 4 / 3 Z p i Z F 6 a C a + O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X S S K Y Y + S 0 S i O i H V K L h E 3 3 A j s J M q p H E o s B 2 O 7 2 Z + + w m V 5 o l 8 N J M U g 5 g O J Y 8 4 o 8 Z K / o T c E L d f r b l 1 d w 6 y S r y C 1 K B A s 1 / 9 6 g 0 S l s U o D R N U 6 6 7 n p i b I q T K c C Z x W e p n G l L I x H W L X U k l j 1 E E + P 3 Z K z q w y I F G i b E l D 5 u r v i Z z G W k / i 0 H b G 1 I z 0 s j c T / / O 6 m Y m u g 5 z L N D M o 2 W J R l A l i E j L 7 n A y 4 Q m b E x B L K F L e 3 E j a i i j J j 8 6 n Y E L z l l 1 d J 6 6 L u u X X v 4 b L W u C 3 i K M M J n M I 5 e H A F D b i H J v j A g M M z v M K b I 5 0 X 5 9 3 5 W L S W n G L m G P 7 A + f w B h I m N 0 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " e 1 D L B 6 e X V 4 C 8 G 2 Q 3 J 8 J V g 2 f x f T Y = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y H A 4 / 3 Z p i Z F 6 a C a + O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X S S K Y Y + S 0 S i O i H V K L h E 3 3 A j s J M q p H E o s B 2 O 7 2 Z + + w m V 5 o l 8 N J M U g 5 g O J Y 8 4 o 8 Z K / o T c E L d f r b l 1 d w 6 y S r y C 1 K B A s 1 / 9 6 g 0 S l s U o D R N U 6 6 7 n p i b I q T K c C Z x W e p n G l L I x H W L X U k l j 1 E E + P 3 Z K z q w y I F G i b E l D 5 u r v i Z z G W k / i 0 H b G 1 I z 0 s j c T / / O 6 m Y m u g 5 z L N D M o 2 W J R l A l i E j L 7 n A y 4 Q m b E x B L K F L e 3 E j a i i j J j 8 6 n Y E L z l l 1 d J 6 6 L u u X X v 4 b L W u C 3 i K M M J n M I 5 e H A F D b i H J v j A g M M z v M K b I 5 0 X 5 9 3 5 W L S W n G L m G P 7 A + f w B h I m N 0 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " e 1 D L B 6 e X V 4 C 8 G 2 Q 3 J 8 J V g 2 f x f T Y = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y H A 4 / 3 Z p i Z F 6 a C a + O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X S S K Y Y + S 0 S i O i H V K L h E 3 3 A j s J M q p H E o s B 2 O 7 2 Z + + w m V 5 o l 8 N J M U g 5 g O J Y 8 4 o 8 Z K / o T c E L d f r b l 1 d w 6 y S r y C 1 K B A s 1 / 9 6 g 0 S l s U o D R N U 6 6 7 n p i b I q T K c C Z x W e p n G l L I x H W L X U k l j 1 E E + P 3 Z K z q w y I F G i b E l D 5 u r v i Z z G W k / i 0 H b G 1 I z 0 s j c T / / O 6 m Y m u g 5 z L N D M o 2 W J R l A l i E j L 7 n A y 4 Q m b E x B L K F L e 3 E j a i i j J j 8 6 n Y E L z l l 1 d J 6 6 L u u X X v 4 b L W u C 3 i K M M J n M I 5 e H A F D b i H J v j A g M M z v M K b I 5 0 X 5 9 3 5 W L S W n G L m G P 7 A + f w B h I m N 0 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " e 1 D L B 6 e X V 4 C 8 G 2 Q 3 J 8 J V g 2 f x f T Y = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y H A 4 / 3 Z p i Z F 6 a C a + O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X S S K Y Y + S 0 S i O i H V K L h E 3 3 A j s J M q p H E o s B 2 O 7 2 Z + + w m V 5 o l 8 N J M U g 5 g O J Y 8 4 o 8 Z K / o T c E L d f r b l 1 d w 6 y S r y C 1 K B A s 1 / 9 6 g 0 S l s U o D R N U 6 6 7 n p i b I q T K c C Z x W e p n G l L I x H W L X U k l j 1 E E + P 3 Z K z q w y I F G i b E l D 5 u r v i Z z G W k / i 0 H b G 1 I z 0 s j c T / / O 6 m Y m u g 5 z L N D M o 2 W J R l A l i E j L 7 n A y 4 Q m b E x B L K F L e 3 E j a i i j J j 8 6 n Y E L z l l 1 d J 6 6 L u u X X v 4 b L W u C 3 i K M M J n M I 5 e H A F D b i H J v j A g M M z v M K b I 5 0 X 5 9 3 5 W L S W n G L m G P 7 A + f w B h I m N 0 g = = < / l a t e x i t > Label y : (1 or 0) < l a t e x i t s h a 1 _ b a s e 6 4 = " + K K V 0 d V F F F P u x M 5 E i 2 G m v w W 3 e s g = " > A A A C C 3 i c b Z A 7 S w N B E M f 3 f M b 4 i l r a L A l C b M K d C I p V 0 M b C I o J 5 Q H K E v c 1 c s m T v w e 6 c e B z p b f w q N h a K 2 P o F 7 P w 2 b h 6 F J k 7 1 5 / e f Y W b + X i y F R t v + t p a W V 1 b X 1 n M b + c 2 t 7 Z 3 d w t 5 + Q 0 e J 4 l D n k Y x U y 2 M a p A i h j g I l t G I F L P A k N L 3 h 1 d h v 3 o P S I g r v M I 3 B D V g / F L 7 g D A 3 q F o o d h A f M b p g H k o 5 o e k H L D p 0 y G i l D 7 O N u o W R X 7 E n R R e H M R I n M q t Y t f H V 6 E U 8 C C J F L p n X b s W N 0 M 6 Z Q c A m j f C f R E D M + Z H 1 o G x m y A L S b T X 4 Z 0 S N D e t Q 3 u / 0 o R D q h v y c y F m i d B p 7 p D B g O 9 L w 3 h v 9 5 7 Q T 9 c z c T Y Z w g h H y 6 y E 8 k x Y i O g 6 E 9 o Y C j T I 1 g X A l z K + U D p h h H E 1 / e h O D M v 7 w o G i c V x 6 4 4 t 6 e l 6 u U s j h w 5 J E V S J g 4 5 I 1 V y T W q k T j h 5 J M / k l b x Z T 9 a L 9 W 5 9 T F u X r N n M A f l T 1 u c P O O y Z N w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " + K K V 0 d V F F F P u x M 5 E i 2 G m v w W 3 e s g = " > A A A C C 3 i c b Z A 7 S w N B E M f 3 f M b 4 i l r a L A l C b M K d C I p V 0 M b C I o J 5 Q H K E v c 1 c s m T v w e 6 c e B z p b f w q N h a K 2 P o F 7 P w 2 b h 6 F J k 7 1 5 / e f Y W b + X i y F R t v + t p a W V 1 b X 1 n M b + c 2 t 7 Z 3 d w t 5 + Q 0 e J 4 l D n k Y x U y 2 M a p A i h j g I l t G I F L P A k N L 3 h 1 d h v 3 o P S I g r v M I 3 B D V g / F L 7 g D A 3 q F o o d h A f M b p g H k o 5 o e k H L D p 0 y G i l D 7 O N u o W R X 7 E n R R e H M R I n M q t Y t f H V 6 E U 8 C C J F L p n X b s W N 0 M 6 Z Q c A m j f C f R E D M + Z H 1 o G x m y A L S b T X 4 Z 0 S N D e t Q 3 u / 0 o R D q h v y c y F m i d B p 7 p D B g O 9 L w 3 h v 9 5 7 Q T 9 c z c T Y Z w g h H y 6 y E 8 k x Y i O g 6 E 9 o Y C j T I 1 g X A l z K + U D p h h H E 1 / e h O D M v 7 w o G i c V x 6 4 4 t 6 e l 6 u U s j h w 5 J E V S J g 4 5 I 1 V y T W q k T j h 5 J M / k l b x Z T 9 a L 9 W 5 9 T F u X r N n M A f l T 1 u c P O O y Z N w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " + K K V 0 d V F F F P u x M 5 E i 2 G m v w W 3 e s g = " > A A A C C 3 i c b Z A 7 S w N B E M f 3 f M b 4 i l r a L A l C b M K d C I p V 0 M b C I o J 5 Q H K E v c 1 c s m T v w e 6 c e B z p b f w q N h a K 2 P o F 7 P w 2 b h 6 F J k 7 1 5 / e f Y W b + X i y F R t v + t p a W V 1 b X 1 n M b + c 2 t 7 Z 3 d w t 5 + Q 0 e J 4 l D n k Y x U y 2 M a p A i h j g I l t G I F L P A k N L 3 h 1 d h v 3 o P S I g r v M I 3 B D V g / F L 7 g D A 3 q F o o d h A f M b p g H k o 5 o e k H L D p 0 y G i l D 7 O N u o W R X 7 E n R R e H M R I n M q t Y t f H V 6 E U 8 C C J F L p n X b s W N 0 M 6 Z Q c A m j f C f R E D M + Z H 1 o G x m y A L S b T X 4 Z 0 S N D e t Q 3 u / 0 o R D q h v y c y F m i d B p 7 p D B g O 9 L w 3 h v 9 5 7 Q T 9 c z c T Y Z w g h H y 6 y E 8 k x Y i O g 6 E 9 o Y C j T I 1 g X A l z K + U D p h h H E 1 / e h O D M v 7 w o G i c V x 6 4 4 t 6 e l 6 u U s j h w 5 J E V S J g 4 5 I 1 V y T W q k T j h 5 J M / k l b x Z T 9 a L 9 W 5 9 T F u X r N n M A f l T 1 u c P O O y Z N w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " + K K V 0 d V F F F P u x M 5 E i 2 G m v w W 3 e s g = " > A A A C C 3 i c b Z A 7 S w N B E M f 3 f M b 4 i l r a L A l C b M K d C I p V 0 M b C I o J 5 Q H K E v c 1 c s m T v w e 6 c e B z p b f w q N h a K 2 P o F 7 P w 2 b h 6 F J k 7 1 5 / e f Y W b + X i y F R t v + t p a W V 1 b X 1 n M b + c 2 t 7 Z 3 d w t 5 + Q 0 e J 4 l D n k Y x U y 2 M a p A i h j g I l t G I F L P A k N L 3 h 1 d h v 3 o P S I g r v M I 3 B D V g / F L 7 g D A 3 q F o o d h A f M b p g H k o 5 o e k H L D p 0 y G i l D 7 O N u o W R X 7 E n R R e H M R I n M q t Y t f H V 6 E U 8 C C J F L p n X b s W N 0 M 6 Z Q c A m j f C f R E D M + Z H 1 o G x m y A L S b T X 4 Z 0 S N D e t Q 3 u / 0 o R D q h v y c y F m i d B p 7 p D B g O 9 L w 3 h v 9 5 7 Q T 9 c z c T Y Z w g h H y 6 y E 8 k x Y i O g 6 E 9 o Y C j T I 1 g X A l z K + U D p h h H E 1 / e h O D M v 7 w o G i c V x 6 4 4 t 6 e l 6 u U s j h w 5 J E V S J g 4 5 I 1 V y T W q k T j h 5 J M / k l b x Z T 9 a L 9 W 5 9 T F u X r N n M A f l T 1 u c P O O y Z N w = = < / l a t e x i t > u < l a t e x i t s h a 1 _ b a s e 6 4 = " T 1 u A 6 W 5 + 3 x w x D K D 7 Y a x K k P t W l g 0 = " > A A A B 8 X i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v g q i Q i 6 L L o x m U F + 8 A 2 l M l 0 0 g 6 d T M L M j V B C / 8 K N C 0 X c + j f u / B s n b R b a e m D g c M 6 9 z L k n S K Q w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 T Z x q x l s s l r H u B t R w K R R v o U D J u 4 n m N A o k 7 w S T 2 9 z v P H F t R K w e c J p w P 6 I j J U L B K F r p s R 9 R H A d h l s 4 G 1 Z p b d + c g q 8 Q r S A 0 K N A f V r / 4 w Z m n E F T J J j e l 5 b o J + R j U K J v m s 0 k 8 N T y i b 0 B H v W a p o x I 2 f z R P P y J l V h i S M t X 0 K y V z 9 v Z H R y J h p F N j J P K F Z 9 n L x P 6 + X Y n j t Z 0 I l K X L F F h + F q S Q Y k / x 8 M h S a M 5 R T S y j T w m Y l b E w 1 Z W h L q t g S v O W T V 0 n 7 o u 6 5 d e / + s t a 4 K e o o w w m c w j l 4 c A U N u I M m t I C B g m d 4 h T f H O C / O u / O x G C 0 5 x c 4 x / I H z + Q P 4 q p E Z < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T 1 u A 6 W 5 + 3 x w x D K D 7 Y a x K k P t W l g 0 = " > A A A B 8 X i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v g q i Q i 6 L L o x m U F + 8 A 2 l M l 0 0 g 6 d T M L M j V B C / 8 K N C 0 X c + j f u / B s n b R b a e m D g c M 6 9 z L k n S K Q w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 T Z x q x l s s l r H u B t R w K R R v o U D J u 4 n m N A o k 7 w S T 2 9 z v P H F t R K w e c J p w P 6 I j J U L B K F r p s R 9 R H A d h l s 4 G 1 Z p b d + c g q 8 Q r S A 0 K N A f V r / 4 w Z m n E F T J J j e l 5 b o J + R j U K J v m s 0 k 8 N T y i b 0 B H v W a p o x I 2 f z R P P y J l V h i S M t X 0 K y V z 9 v Z H R y J h p F N j J P K F Z 9 n L x P 6 + X Y n j t Z 0 I l K X L F F h + F q S Q Y k / x 8 M h S a M 5 R T S y j T w m Y l b E w 1 Z W h L q t g S v O W T V 0 n 7 o u 6 5 d e / + s t a 4 K e o o w w m c w j l 4 c A U N u I M m t I C B g m d 4 h T f H O C / O u / O x G C 0 5 x c 4 x / I H z + Q P 4 q p E Z < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T 1 u A 6 W 5 + 3 x w x D K D 7 Y a x K k P t W l g 0 = " > A A A B 8 X i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v g q i Q i 6 L L o x m U F + 8 A 2 l M l 0 0 g 6 d T M L M j V B C / 8 K N C 0 X c + j f u / B s n b R b a e m D g c M 6 9 z L k n S K Q w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 T Z x q x l s s l r H u B t R w K R R v o U D J u 4 n m N A o k 7 w S T 2 9 z v P H F t R K w e c J p w P 6 I j J U L B K F r p s R 9 R H A d h l s 4 G 1 Z p b d + c g q 8 Q r S A 0 K N A f V r / 4 w Z m n E F T J J j e l 5 b o J + R j U K J v m s 0 k 8 N T y i b 0 B H v W a p o x I 2 f z R P P y J l V h i S M t X 0 K y V z 9 v Z H R y J h p F N j J P K F Z 9 n L x P 6 + X Y n j t Z 0 I l K X L F F h + F q S Q Y k / x 8 M h S a M 5 R T S y j T w m Y l b E w 1 Z W h L q t g S v O W T V 0 n 7 o u 6 5 d e / + s t a 4 K e o o w w m c w j l 4 c A U N u I M m t I C B g m d 4 h T f H O C / O u / O x G C 0 5 x c 4 x / I H z + Q P 4 q p E Z < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T 1 u A 6 W 5 + 3 x w x D K D 7 Y a x K k P t W l g 0 = " > A A A B 8 X i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v g q i Q i 6 L L o x m U F + 8 A 2 l M l 0 0 g 6 d T M L M j V B C / 8 K N C 0 X c + j f u / B s n b R b a e m D g c M 6 9 z L k n S K Q w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 T Z x q x l s s l r H u B t R w K R R v o U D J u 4 n m N A o k 7 w S T 2 9 z v P H F t R K w e c J p w P 6 I j J U L B K F r p s R 9 R H A d h l s 4 G 1 Z p b d + c g q 8 Q r S A 0 K N A f V r / 4 w Z m n E F T J J j e l 5 b o J + R j U K J v m s 0 k 8 N T y i b 0 B H v W a p o x I 2 f z R P P y J l V h i S M t X 0 K y V z 9 v Z H R y J h p F N j J P K F Z 9 n L x P 6 + X Y n j t Z 0 I l K X L F F h + F q S Q Y k / x 8 M h S a M 5 R T S y j T w m Y l b E w 1 Z W h L q t g S v O W T V Compositionality Gap. In addition to defining systematic splits, we also characterize (for the first time, in our knowledge), the difficulty of generalization entailed by each split by introducing the notion of a model-independent "compositionality gap". Concretely, the compositionality gap is the difference in test performance between an ideal Bayesian learner with access to the full hypothesis space, and a Bayesian learner with access to only a (potentially large) list of the hypotheses examined during meta-training. A large gap indicates that any learner must extrapolate compositionally from the training hypotheses to solve the task; additionally, models can be compared to ideal learners that either do or do not engage in such extrapolation. We anticipate that this tool will be more broadly useful for analyzing other benchmarks with compositional splits. Models. We evaluate models around various dimensions which concern the difficulty of learning productive concepts under uncertainty, including: 1) the modality in which the input is rendered (image, schemas, sounds), 2) method used for reasoning across objects in a scene (transformer, 

2. RELATED WORK

Compositional Learning. Related work has examined systematic generalization in pattern completion using Raven's matrices (PGM) (Santoro et al., 2017; Hill et al., 2019) and visual question answering with CLEVR (Johnson et al., 2016; Bahdanau et al., 2019 ). CURI's use of the CLEVR renderer further invites particular comparison with that benchmark. Compared to these more deductive reasoning tests, CURI examines few-shot concept learning under substantial inherent uncertainty. Unlike puzzle solving or question answering, an ideal inductive learner on CURI cannot know the right rule with certainty. In essence, unlike CLEVR the "question" to be answered is not given to the model as input, but must be inferred -making the task more challenging. While PGMs do involve such an inference, once the constraints of a puzzle are identified, it does not: 1) have any uncertainty in the reasoning (which is crucial) and 2) involve any "concept" learning -where a concept applies to multiple images -as much as it involves "instance" matching to complete a sequence. In contrast, a successful CURI model behaves as if marginalizing over many hypotheses consistent with the observations e.g., (Tenenbaum & Griffiths, 2001; Xu & Tenenbaum, 2007; Piantadosi et al., 2016) , an ability which is rarely studied directly in deep learning models (although see (Grant et al., 2019) ). Recently, Keysers et al. (2019) proposed a method to create "difficult" systematic splits based on the principle that they should share atoms but have maximally different compositions. This is complementary to our splits, which provide interpretable notions of what each split tests such as disentangling, complexity, variable binding etc. Moreover, our variable binding split is predicated on having different atoms between train and test, and thus cannot be recovered by their methodology. Language of Thought (LOT). Our choice of compositional concepts was most closely inspired by (Piantadosi et al., 2016) along with other studies of human concept learning in the Language of Thought (LOT) framework (Fodor, 1975; Goodman et al., 2008; Kemp & Jern, 2009; Piantadosi et al., 2012; Goodman et al., 2015; Overlan et al., 2017; Lake & Piantadosi, 2019) . In typical LOT studies of human learning, the conceptual space H is defined through a probabilistic context-free grammar G, which specifies a set of conceptual primitives and their rules of combination. Here, we use a LOT-inspired grammar G to generate an unbounded set concepts H, while evaluating machine learning models trained without access to the underlying LOT.

3. COMPOSITIONAL REASONING UNDER UNCERTAINTY (CURI) DATASET

Concept space. The compositional concepts in CURI were inspired by the empirical and cognitive modeling work of Piantadosi et al. (2016) . The space of concepts (LOT) is defined by a context free grammar (G). Figure 3 shows the LOT and specifies how primitives and functions compose to produce a large unbounded concept space. The LOT has three variables: x, representing an object in a scene, S = {x} N i=1 representing the set of all objects in the scene, and S -x = S/{x}, representing the set of all objects in the scene except x. Each concept describes a rule composed of object and scene properties, logical operators, and/or comparison operators, and can be evaluated on a given scene S to determine whether the scene satisfies the rule. Object and scene properties are defined by functions which can be applied to objects or scenes: for example, size?(x) yields the size of an object x, while size?(S) returns a set with the sizes of all the objects ({size?(x) : x ∈ S}). Comparison and logical operators can be used to compare and relate various properties of objects in scenes. In contrast to Piantadosi et al. ( 2016), we include a count operator, which determines how many times a condition is satisfied by a set, which allows us to check how well deep learning models are able to count (Chattopadhyay et al., 2016; Johnson et al., 2016; Agrawal et al., 2017) . Finally, quantifiers such as exists and for-all enrich the LOT by specifying the number of objects which must satisfy a given condition. Consider the following example concept (Figure 1 bottom): "There exists a blue object in the scene and the rest of the objects are squares." To access the color of a given object, we use color?(x) and to access the shape of a given object, we use shape?(x). To determine whether an object matches a specific property, we can combine this with equality: shape?(x) = "square". Finally, we can use exists to specify that at least one object must be blue, S -x to specify all the objects except for that blue object, and all to specify that all the objects in S -x must be squares. Putting it all together: exists x ∈ S (color?(x) = "blue") and all (shape?(S -x ) = "square"). Structured Generalization Splits. A signature of productivity is the ability to handle novel combinations of known components (Fodor, 1975; Fodor & Pylyshyn, 1988) . Thus, in CURI, we consider splits that require generalizing to novel combinations of known elements from our LOT (Figure 3 ), including combinations of constants, variables, and functions. We achieve this by creating disjoint splits of concepts H train and H test for training and evaluating models. By varying the held out elements and their combinations, we obtain splits that evaluate different axes of generalization. In practice, we use our grammar G to sample and filter a large set of concepts (see Appendix B.2 for more details), which yields a set of 14,929 concepts H for training and evaluation. We next describe how each split divides H into H train and H test , to test productive, out of distribution generalization: • Instance IID: Evaluates generalization to novel episodes from the same concept set. This is the standard setup in machine learning (Murphy, 2013) , in which H train = H test . This is the only split where train and test concepts overlap. or equal to 10 symbols) to more complex concepts (longer than 10 symbols). This is indicative of the productivity (Fodor, 1975) exhibited by models, in generalizing from simpler concepts to more complex concepts. • Variable binding: Evaluates learning of entirely novel intrinsic properties, e.g. the training concepts involve only "red", "blue", and "green" but test concepts involve "yellow" (although 'yellow' objects can still appear in training scenes). This is indicative of inferential coherence (Fodor, 1975) in models, in generalizing rules of inference to novel atoms. A model that infers the underlying LOT during meta-training would be expected to perform well on any such systematic split. By comparing the performance of current models to to such ideal learners, this benchmark will allow us to evaluate progress on the systematic out-of-distribution generalization capabilities of our current models. Appendix C provides more details on the strucutred splits. From Concepts to Meta-learning Episodes. A single episode comprises a support set (D supp ) and a query set (D query ), each of which is generated from a given concept, h. Formally, a support or query set D has input data u and corresponding label y, i.e. D = {{y i } N i=1 , {u i } N i=1 }. Each support and query set contains 5 positive and 20 negative examples -negative examples are oversampled since the space of negatives is generally much larger than that for positives. The set of positive examples are sampled uniformly from a categorical distribution over all positives. However, we consider two types of negatives: 1) easy negatives, in which the negatives are also sampled at random, and 2) hard negatives, in which negatives are generated from a closely related concept which also evaluates true on the positive examples in D supp , such that these negatives are maximally confusing. Altogether, for each split, our train, validation, and test sets contain 500000, 5000, and 20000 episodes, respectively. Compositionality Gap. A key aspect of our benchmark is to define the difficulty in learning that arises from the compositional structure of the concept space. Most of the splits above are structured in a way such that H test ∩ H train = ∅ -forcing a learner to use the compositional structure of the concept space to generalize to H test . We conceptualize the difficulty of this task through the notion of its compositionality gap. Intuitively, the compositionality gap captures the difference between the generalization performance of an ideal compositional learner (strong oracle) compared to an ideal non-compositional learner that is unable to extrapolate outside the training concepts (weak oracle). Formally, let Ω ∈ {strong, weak} denote an oracle over a concept space H Ω . The posterior predictive distribution of an oracle for query scene u and query label y ∈ {0, 1} is then given as: p Ω (y|u, D supp ) = h∈HΩ p Ω (y|h, u)p Ω (h|D supp ), where p Ω (h|D supp ) ∝ p Ω (h) p({y i } N i=1 |h; {u i } N i=1 ) and p Ω (h) denote the posterior and prior, respectively. Given a metric of interest M (e.g., mean average precision or accuracy), the compositionality gap of a learning task is then simply defined as the difference in performance of the strong and weak oracle when evaluating on concepts from H test , i.e., M (p strong ) -M (p weak ). Using this notion of compositionality gap, we can then define ideal learners, i.e., the strong and weak oracle, simply via their priors. In particular, let w(h) denote a weight on importance of each hypothesisfoot_1 and let I denote the indicator function. We then define the prior of an oracle as p Ω (h) = h ∈HΩ w(h )I[h = h],. The difference between strong and weak oracle lies in which concepts can be accessed in these priors. In this formalism, the strong oracle has access to the union of train and test concepts; that is H strong = H train ∪ H test . The weak oracle, on the other hand only assumes access to H weak = H train , which means it is unable to consider any hypothesis outside what has been seen in training and assigning it zero probability mass. Given a support set D supp this difference in priors leads then to different inferences on posteriors and allows us to quantify how compositionally novel a learning task is relative to these ideal learners.

4. METRICS AND BASELINES

During meta-test, given D supp models are evaluated on their ability to learn novel concepts. We use two metrics for quantifying this: 1) Accuracy: evaluates the accuracy of model predictions across the query set D query , as is standard practice in meta-learning (Lake et al., 2019; Snell et al., 2017) . Since there are more negative than positive labels, we report class balanced accuracy for better interpretability, averaging accuracies for the positive and negative query examples; and 2) mean Average Precision (mAP): evaluates models on a much larger number of test scenes T for each episode (comprising 44,787 scenes, 3 per each concept in H). This resolves an issue that with a small query set, a strong model could achieve perfect accuracy without grasping the concept. Since episodes typically have many more negative than positive examples, Average Precision sweeps over different thresholds of a model's score and reports the average of the precision values at different recall rates, e.g., (Everingham et al., 2010) . mAP is then the mean across all of the meta-test episodes.

Object Encoder Pooling

[ {"color": blue, "shape": cylinder, "size": small, "location_x": 80, …}, {"color": blue, "shape": cylinder, "size": large, "location_x": 130, …}, {"color": blue, "shape": cylinder, "size": large, "location_x": 140, …} ]  K V M = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x o r W F N p T N d t I u 3 W z C 7 k Y I o T / B i w d F v P q L v P l v 3 L Y 5 a O u D g c d 7 M 8 z M C x L B t X H d b 6 e 0 s r q 2 v l H e r G x t 7 + z u V f c P H n W c K o Y t F o t Y d Q K q U X C J L c O N w E 6 i k E a B w H Y w v p n 6 7 S d U m s f y w W Q J + h E d S h 5 y R o 2 V 7 r M r r 1 + t u X V 3 B r J M v I L U o E C z X / 3 q D W K W R i g N E 1 T r r u c m x s + p M p w J n F R 6 q c a E s j E d Y t d S S S P U f j 4 7 d U J O r D I g Y a x s S U N m 6 u + J n E Z a Z 1 F g O y N q R n r R m 4 r / e d 3 U h J d + z m W S G p R s v i h M B T E x m f 5 N B l w h M y K z h D L F 7 a 2 E j a i i z N h 0 K j Y E b / H l Z f J 4 V v f c u n d 3 X m t c F 3 G U 4 Q i O 4 R Q 8 u I A G 3 E I T W K V M = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x o r W F N p T N d t I u 3 W z C 7 k Y I o T / B i w d F v P q L v P l v 3 L Y 5 a O u D g c d 7 M 8 z M C x L B t X H d b 6 e 0 s r q 2 v l H e r G x t 7 + z u V f c P H n W c K o Y t F o t Y d Q K q U X C J L c O N w E 6 i k E a B w H Y w v p n 6 7 S d U m s f y w W Q J + h E d S h 5 y R o 2 V 7 r M r r 1 + t u X V 3 B r J M v I L U o E C z X / 3 q D W K W R i g N E 1 T r r u c m x s + p M p w J n F R 6 q c a E s j E d Y t d S S S P U f j 4 7 d U J O r D I g Y a x s S U N m 6 u + J n E Z a Z 1 F g O y N q R n r R m 4 r / e d 3 U h J d + z m W S G p R s v i h M B T E x m f 5 N B l w h M y K z h D L F 7 a 2 E j a i i z N h 0 K j Y E b / H l Z f J 4 V v f c u n d 3 X m t c F 3 G U 4 Q i O 4 R Q 8 u I A G 3 E I T W K V M = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x o r W F N p T N d t I u 3 W z C 7 k Y I o T / B i w d F v P q L v P l v 3 L Y 5 a O u D g c d 7 M 8 z M C x L B t X H d b 6 e 0 s r q 2 v l H e r G x t 7 + z u V f c P H n W c K o Y t F o t Y d Q K q U X C J L c O N w E 6 i k E a B w H Y w v p n 6 7 S d U m s f y w W Q J + h E d S h 5 y R o 2 V 7 r M r r 1 + t u X V 3 B r J M v I L U o E C z X / 3 q D W K W R i g N E 1 T r r u c m x s + p M p w J n F R 6 q c a E s j E d Y t d S S S P U f j 4 7 d U J O r D I g Y a x s S U N m 6 u + J n E Z a Z 1 F g O y N q R n r R m 4 r / e d 3 U h J d + z m W S G p R s v i h M B T E x m f 5 N B l w h M y K z h D L F 7 a 2 E j a i i z N h 0 K j Y E b / H l Z f J 4 V v f c u n d 3 X m t c F 3 G U 4 Q i O 4 R Q 8 u I A G 3 E I T W K V M = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x o r W F N p T N d t I u 3 W z C 7 k Y I o T / B i w d F v P q L v P l v 3 L Y 5 a O u D g c d 7 M 8 z M C x L B t X H d b 6 e 0 s r q 2 v l H e r G x t 7 + z u V f c P H n W c K o Y t F o t Y d Q K q U X C J L c O N w E 6 i k E a B w H Y w v p n 6 7 S d U m s f y w W Q J + h E d S h 5 y R o 2 V 7 r M r r 1 + t u X V 3 B r J M v I L U o E C z X / 3 q D W K W R i g N E 1 T r r u c m x s + p M p w J n F R 6 q c a E s j E d Y t d S S S P U f j 4 7 d U J O r D I g Y a x s S U N m 6 u + J n E Z a Z 1 F g O y N q R n r R m 4 r / e d 3 U h J d + z m W S G p R s v i h M B T E x m f 5 N B l w h M y K z h D L F 7 a 2 E j a i i z N h 0 K j Y E b / H l Z f J 4 V v f c u n d 3 X m t c F 3 G U 4 Q i O 4 R Q 8 u I A G 3 E I T W O k o U g z q L R K R a P t U g u I Q 6 c h T Q i h X Q 0 B f Q 9 I e X u d + 8 B 6 V 5 J G 9 x F E M n p H 3 J A 8 4 o G q l b 2 v d C i g N G R X q d d V M P 4 Q H T u w T U K M u 6 p b J T c c a w 5 4 k 7 J W U y R a 1 b + v J 6 E U t C k M g E 1 b r t O j F 2 U q q Q M w F Z 0 U s 0 x J Q N a R / I Z J w i S T R Y F i b A x s v N A 7 B 5 X w F C M D K F M c X O r z Q Z U U Y Y m t q I J w Z 1 9 e Z 4 0 T i q u U 3 F v T s v V i 2 k c B X J A D s k x c c k Z q Z I r U i N 1 w s g j e S a v 5 M 1 6 s l 6 s d + t j 0 r p g T W f 2 y B 9 Y n z + q F p j P < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " + 6 Q 7 4 T r Z h 4 X W Q 4 c q 4 P q n T w + / l e A = " >  O k o U g z q L R K R a P t U g u I Q 6 c h T Q i h X Q 0 B f Q 9 I e X u d + 8 B 6 V 5 J G 9 x F E M n p H 3 J A 8 4 o G q l b 2 v d C i g N G R X q d d V M P 4 Q H T u w T U K M u 6 p b J T c c a w 5 4 k 7 J W U y R a 1 b + v J 6 E U t C k M g E 1 b r t O j F 2 U q q Q M w F Z 0 U s 0 x J Q N a R / I Z J w i S T R Y F i b A x s v N A 7 B 5 X w F C M D K F M c X O r z Q Z U U Y Y m t q I J w Z 1 9 e Z 4 0 T i q u U 3 F v T s v V i 2 k c B X J A D s k x c c k Z q Z I r U i N 1 w s g j e S a v 5 M 1 6 s l 6 s d + t j 0 r p g T W f 2 y B 9 Y n z + q F p j P < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " + 6 Q 7 4 T r Z h 4 X W Q 4 c q 4 P q n T w + / l e A = " >  O k o U g z q L R K R a P t U g u I Q 6 c h T Q i h X Q 0 B f Q 9 I e X u d + 8 B 6 V 5 J G 9 x F E M n p H 3 J A 8 4 o G q l b 2 v d C i g N G R X q d d V M P 4 Q H T u w T U K M u 6 p b J T c c a w 5 4 k 7 J W U y R a 1 b + v J 6 E U t C k M g E 1 b r t O j F 2 U q q Q M w F Z 0 U s 0 x J Q N a R / I Z J w i S T R Y F i b A x s v N A 7 B 5 X w F C M D K F M c X O r z Q Z U U Y Y m t q I J w Z 1 9 e Z 4 0 T i q u U 3 F v T s v V i 2 k c B X J A D s k x c c k Z q Z I r U i N 1 w s g j e S a v 5 M 1 6 s l 6 s d + t j 0 r p g T W f 2 y B 9 Y n z + q F p j P < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " + 6 Q 7 4 T r Z h 4 X W Q 4 c q 4 P q n T w + / l e A = " >  O k o U g z q L R K R a P t U g u I Q 6 c h T Q i h X Q 0 B f Q 9 I e X u d + 8 B 6 V 5 J G 9 x F E M n p H 3 J A 8 4 o G q l b 2 v d C i g N G R X q d d V M P 4 Q H T u w T U K M u 6 p b J T c c a w 5 4 k 7 J W U y R a 1 b + v J 6 E U t C k M g E 1 b r t O j F 2 U q q Q M w F Z 0 U s 0 x J Q N a R / J Y h G r T k A 1 C i 6 x a b g R 2 E k U 0 i g Q 2 A 7 G d z O / / Y R K 8 1 g + m E m C f k S H k o e c U W O l x k W / X H G r 7 h x k l X g 5 q U C O e r / 8 1 R v E L I 1 Q G i a o 1 l 3 P T Y y f U W U 4 E z g t 9 V K N C W V j O s S u p Z J G q P 1 s f u i U n F l l Q M J Y 2 Z K G z N X f E x m N t J 5 E g e 2 M q B n p Z W 8 m / u d 1 U x P e + B m X S W p Q s s W i M B X E x G T 2 N R l w h c y I i S W U K W 5 v J W x E F W X G Z l O y I X j L L 6 + S 1 m X V c J Y h G r T k A 1 C i 6 x a b g R 2 E k U 0 i g Q 2 A 7 G d z O / / Y R K 8 1 g + m E m C f k S H k o e c U W O l x k W / X H G r 7 h x k l X g 5 q U C O e r / 8 1 R v E L I 1 Q G i a o 1 l 3 P T Y y f U W U 4 E z g t 9 V K N C W V j O s S u p Z J G q P 1 s f u i U n F l l Q M J Y 2 Z K G z N X f E x m N t J 5 E g e 2 M q B n p Z W 8 m / u d 1 U x P e + B m X S W p Q s s W i M B X E x G T 2 N R l w h c y I i S W U K W 5 v J W x E F W X G Z l O y I X j L L 6 + S 1 m X V c J Y h G r T k A 1 C i 6 x a b g R 2 E k U 0 i g Q 2 A 7 G d z O / / Y R K 8 1 g + m E m C f k S H k o e c U W O l x k W / X H G r 7 h x k l X g 5 q U C O e r / 8 1 R v E L I 1 Q G i a o 1 l 3 P T Y y f U W U 4 E z g t 9 V K N C W V j O s S u p Z J G q P 1 s f u i U n F l l Q M J Y 2 Z K G z N X f E x m N t J 5 E g e 2 M q B n p Z W 8 m / u d 1 U x P e + B m X S W p Q s s W i M B X E x G T 2 N R l w h c y I i S W U K W 5 v J W x E F W X G Z l O y I X j L L 6 + S 1 m X V c J Y h G r T k A 1 C i 6 x a b g R 2 E k U 0 i g Q 2 A 7 G d z O / / Y R K 8 1 g + m E m C f k S H k o e c U W O l x k W / X H G r 7 h x k l X g 5 q U C O e r / 8 1 R v E L I 1 Q G i a o 1 l 3 P T Y y f U W U 4 E z g t 9 V K N C W V j O s S u p Z J G q P 1 s f u i U n F l l Q M J Y 2 Z K G z N X f E x m N t J 5 E g e 2 M q B n p Z W 8 m / u d 1 U x P e + B m X S W p Q s s W i M B X E x G T 2 N R l w h c y I i S W U K W 5 v J W x E F W X G Z l O y I X j L L 6 + S 1 m X V c A W + x w x D K D Y a x K k P t W l g = " > A A A B X i c b V D L S s N A F L p r p f V Z d u B o v g q i Q i L L o x m U F + A l M l g d T M L M j V B C / K N C X c + j f u / B s n b R b a e m D g c M z L k n S K Q w L r f T m l t f W N z q x d d n d z + o H h T Z x q x l s s l r H u B t R w K R R v o U D J u n m N A o k w S T z v P H F t R K w e c J p w P I j J U L B K F r p s R R H A d h l s G Z p b d + c g q Q r S A K N A f V r / w Z m n E F T J J j e l b o J + R j U K J v m s k N T y i b B H v W a p o x I f z R P P y J l V h i S M t X K y V z v Z H R y J h p F N j J P K F Z n L x P + X Y n j t Z I l K X L F F h + F q S Q Y k / x M h S a M R

T S y j T w m Y l b E w Z W h L q t g S v O W T V n o u d e / + s t a K e o o w w m c w j l c A U N u I M m t I C B g m d h T f H O C / O u / O x G C x c x / I H z + Q P q p E Z < / l a t e x i t > < l a t e x i t s h a _ b a s e = " T u A W + x w x D K D Y a x K k P t W l g = " >

A A A B X i c b V D L S s N A F L p r p f V Z d u B o v g q i Q i L L o x m U F + A l M l g d T M L M j V B C / K N C X c + j f u / B s n b R b a e m D g c M z L k n S K Q w L r f T m l t f W N z q x d d n d z + o H h T Z x q x l s s l r H u B t R w K R R v o U D J u n m N A o k w S T z v P H F t R K w e c J p w P I j J U L B K F r p s R R H A d h l s G Z p b d + c g q Q r S A K N A f V r / w Z m n E F T J J j e l b o J + R j U K J v m s k N T y i b B H v W a p o x I f z R P P y J l V h i S M t X K y V z v Z H R y J h p F N j J P K F Z n L x P + X Y n j t Z I l K X L F F h + F q S Q Y k / x M h S a M R A A A B X i c b V D L S s N A F L p r p f V Z d u B o v g q i Q i L L o x m U F + A l M l g d T M L M j V B C / K N C X c + j f u / B s n b R b a e m D g c M z L k n S K Q w L r f T m l t f W N z q x d d n d z + o H h T Z x q x l s s l r H u B t R w K R R v o U D J u n m N A o k w S T z v P H F t R K w e c J p w P I j J U L B K F r p s R R H A d h l s G Z p b d + c g q Q r S A K N A f V r / w Z m n E F T J J j e l b o J + R j U K J v m s k N T y i b B H v W a p o x I f z R P P y J l V h i S M t X K y V z v Z H R y J h p F N j J P K F Z n L x P + X Y n j t Z I l K X L F F h + F q S Q Y k / x M h S a M R A A A B X i c b V D L S s N A F L p r p f V Z d u B o v g q i Q i L L o x m U F + A l M l g d T M L M j V B C / K N C X c + j f u / B s n b R b a e m D g c M z L k n S K Q w L r f T m l t f W N z q x d d n d z + o H h T Z x q x l s s l r H u B t R w K R R v o U D J u n m N A o k w S T z v P H F t R K w e c J p w P I j J U L B K F r p s R R H A d h l s G Z p b d + c g q Q r S A K N A f V r / w Z m n E F T J J j e l b o J + R j U K J v m s k N T y i b B H v W a p o x I f z R P P y J l V h i S M t X K y V z v Z H R y J h p F N j J P K F Z n L x P + X Y n j t Z I l K X L F F h + F q S Q Y k / x M h S a M R T S y j T w m Y l b E w Z W h L q t g S v O W T V M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > f (u) < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > f (u) < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O 2 H l x 3 p 2 P + e i a s 9 g 5 g T 9 w P n 8 A h 8 e R 7 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " We consider three modalities each of which is processed with a modality specific encoder, followed by four kinds of pooling architecture which take as input objects and their corresponding locations to provide an encoding for the datapoint. Training (right). The model is trained by processing the support images Dsupp with positive (green) and negative (red) images, using f (x) to compute Lquery which computes generalization error on queries and Lconcept which learns to decode the true concept as an auxiliary task. Losses are weighted by α ≥ 0. x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W M S q H V C N g k t s G G 4 E t h O F N A o E t o L R X e 6 3 x q g 0 j + W j m S T o R 3 Q g e c g Z N V b y w 0 o 3 o m Y Y h F k 6 v e i V y m 7 V n Y G s E m 9 B y r B A v V f 6 6 v Z j l k Y o D R N U 6 4 7 n J s b P q D K c C Z w W u 6 n G h L I R H W D H U k k j 1 H 4 2 C z 0 l 5 1 b p k z B W 9 k l D Z u r v j Y x G W k + i w E 7 m E f W y l 4 v / e Z 3 U h D d + x m W S G p R s f i h M B T E x y R s g f a 6 Q G T G x h D L F b V b C h l R R Z m x P R V u C t / z l V d K 8 r H p u 1 X u 4 K t d u F 3 U U 4 B T O o A I e X E M N 7 q E O D W D w B M / w C m / O

4.1. TRAINING LOSS

Denote by u ∈ R M the input to the model, which can be either in the form of image, sound or schema. We work in a binary classification setting with labels y live in the space Y ∈ {0, 1}. Then, given a support set D supp = {u i , y i } T i=1 and a query set D query = {u i , y i } T i=1 , sampled in accordance with a productive concept h, our training objective for a single training instance can be written as L query + αL concept . Here L query = u,y∈Dquery log p(Y = y|u, D supp ) is a standard maximum likelihood meta-learning loss (Ravi & Larochelle, 2016; Snell et al., 2017; Finn et al., 2017) , and L concept = log p(H = h|D supp ) is an optional regularizer designed to encourage retaining information about the hypothesis of interest from the support set.

4.2. BASELINE MODEL ARCHITECTURES

Our baseline models (shown in Figure 4 ) parameterize the probability in the L query term above using prototypical networks (Snell et al., 2017) . The prototypical network consists of an embedding function f = f θ and uses it to compute prototypes c p and c n for positive and negative examples by averaging f (u) for positive and negative examples in the support set respectively. In equations, given a query datapoint u , we compute p(Y = y|u; D supp ) = exp(||f (u ) -c p || 2 exp(||f (u ) -c p || 2 ) + exp(||f (u ) -c n || 2 (1) In this formalism, the models we study in this paper span different choices for f . Roughly, in each modality, we start with an encoder that converts the raw input into a set of vectors, and then a pooling operation that converts that set of vectors into a single vector. In the case of images and sound (input as spectrograms), the encoder is a ResNet-18; and the set of vectors is a subsampling of spatial locations; and for schemas we vectorize components with a lookup table and combine them into a set via feed-forward networks. In the case of images and sounds, the output of the encoder is enriched with position vectors. For the pooling operation, we study global averaging, concatenation, relation networks (Santoro et al., 2017) and transformers (Vaswani et al., 2017) equipped with different pooling operations (max, mean, sum, min) for reasoning inspired by Wang et al. (2019) (Figure 4 middle panel, also see Appendix F for more details). For the probability in L concept , we represent the concept as a sequence by prefix serialization and then use an LSTM (Hochreiter & Schmidhuber, 1997) 

5. EXPERIMENTAL RESULTS

We first discuss the compositionality gap induced by the different generalization splits and then delve into the impact of modeling choices on performance on the generalization splits. All models are trained for 1 million steps, and are run with with 3 independent training runs to report standard deviations. We sweep over 3 modalities (image, schema, sound), 4 pooling schemes (avg-pool, concat, relation-net, transformer), 2 choices of negatives (hard negatives, random negatives) and choice of language (α = 0.0, 1.0). Unless mentioned otherwise in the main paper we focus on results with hard negatives and α = 0.0. When instantiated for a given modality, we note that the encoders f (u) (Figure 4 ) all have a similar number of parameters. The appendix contains details of the exact hyperparameters (Appendix E), and more comprehensive results for each split (Appendix G.7).

5.1. DATASET DESIGN AND COMPOSITIONALITY

How compositional are the structured splits?. Our main results are shown in Figure 5 . Using our model-independent measure of the compositionality gap (Section 3), different splits present varying challenges for generalizing from train to test. The most difficult splits, with the largest compositionality gaps, are the Binding (color) and Binding (shape), which is reasonable since they require learning concepts with entirely new property-values. In contrast, the easiest split with the smallest compositionality gaps is the Instance IID split since it does not require compositionality. Finally, while the mAP metric exposes a larger value of comp gap, the ordering of splits in terms of comp gap is same for both metrics -suggesting similar coarse-grained notions of compositionality. Results for the best overall architecture, a relation network (relation-net), is shown in Figure 5 . Network performance on the easiest data format (schema; yellow bars) is generally better than the weak oracle, but substantially worse than the strong oracle. Counting is a particularly challenging split where the models underperform even the weak oracle. Broadly, this suggests that the models capture some notion of compositionality -especially for images and schemas -relative to a weak oracle that rigidly considers only training hypotheses, but there is substantial room to improve (especially with respect to the more stringent mAP metric). These results demonstrate that CURI provides a challenging yet tractable setting for evaluating the compositional capabilities of models. Finally, we found that the performance on the Instance IID split is not equal to the weak (and strong) oracle-which are both equal in this case-indicating that the best model does not make ideal posterior predictions even when compositionality is not an issue. Ideal predictions in this case would require the network to behave as if marginalizing over the training hypotheses, as the strong oracle does. A similar plot to Figure 5 can be found in Appendix G.4 for random negatives. Influence of Negatives. Previous work (Hill et al., 2019) has shown that the choice of random v.s. hard negatives for training and evaluation impacts compositional generalization substantially in the case of a particular set of analogical reasoning models. However, we argue that such decisions on dataset design can be made more objectively if one can evaluate the model-independent comp gap. In our context, we find that the comp gap with mAP when using random negatives decreases on average by 5.5 ± 1.4% compared to when we use hard negatives. This indicates that it is not only the choice of H train and H test , which are identical for a given compositional split (say Counting), but also the choice of the negatives which "makes" the task compositionally novel. More generally, this indicates that the comp gap has utility as a more general diagnostic tool for making principled design decisions in compositional learning settings without the confound of specific model decisions.

5.2. DIFFERENCES BETWEEN MODELS

Best Models. In general, the best performing model is the relation-net applied to schema inputs, outperforming other combinations of models and input modalities on the Boolean, Concept IID, Complexity, and Instance IID splits on both the mAP as well as accuracy metrics (Figure 5 ); although as mentioned above, none of the models are close to the strong oracle. It is closely followed by the transformer model on schema inputs, which performs the best on Binding (color), Binding (shape), and Intrinsic splits (Appendix G.7). Utilizing schema inputs proves easier for abstraction except for the Extrinsic setting, where the task requires generalization to novel locations for objects in images, which is well supported by the inductive bias of the CNN encoder (Figure 4 ). In this case, the image-transformer gets an mAP of 62.1 ± 0.7%, compared to the next best schema-transformer model at 60.9 ± 0.7. Further, relational learning proves more crucial in the schema case than for images, with all image models (regardless of pooling) performing better than 59.4 ± 1.3% mAP (achieved for image-avg-pool) while schema-avg-pool models get only get to 53.4 ± 1.5%. When to use a transformer? Transformer models appear to outperform relation networks in splits concerning disentangling. For instance, for the Intrinsic split with schema-relation-net is at 55.1 ± 0.8% v.s. 57.9 ± 0.6% for schema-transformer. Similarly, for the Extrinsic split the imagetransformer is at 62.1 ± 0.7% compared to the image-relation-net at 60.8 ± 1.1%. We hypothesize that this is because the iterative message passing via. attention in transformers improves object representations for disentangling compared to relation networks that lack such a mechanism. What is the relative difficulty of abstraction from different modalities? One of the key contributions of our work is in providing multiple modalities (image, schema, sound) for productive concept learning. We next characterize the difficulty of abstraction based on modality for the various generalization settings. In the Intrinsic setting, we find that the schema models, which have access to a "perfect" disentangled representation significantly outperform image models-a schema-avg-pool model gets an mAP of 52.7 ± 3.1% while an image-avg-pool model gets to 34.4 ± 0.0% mAP. Similarly, for the Counting split where the total number of objects are exactly specified in the schema (Figure 4 ), schemas are substantially better than images. For example, schema-relation-nets get to 56.25 ± 5.32% mAP while image-avg-pool is at 48.4 ± 1.2% mAP. Interestingly, the next best model-image-relation-net-is substantially worse, at 39.45 ± 1.6%. Curiously, for while transformer models perform well at disentangling, they seem to be quite poor for Counting, with image-transformer models getting to only 32.4 ± 1.4% mAP, suggesting a potential weakness for transformers. Overall, there appears to be an intimate link between the generalization setting and the input modality, suggesting avenues where representation learning could be improved for a given modality (e.g. images), relative to the kind of reasoning one is interested in (e.g. counting). When does language help? On average, training models with explicit concept supervision using the concept loss (Section 4.1) improves performance by 2.8 ± 0.6% mAP (SEM error). This is a small boost relative to the gap between the original model and the strong oracle, suggesting that this simple auxiliary loss is not sufficient to internalize the LOT in a neural network. Overall, image models benefit more from language than schema models which natively utilize symbols (Appendix G.3).

6. CONCLUSION

We introduced the compositional reasoning under uncertainty (CURI) benchmark for evaluating fewshot concept learning in a large compositional space, capturing the kinds of producitivity, unboundness and underdetermination that characterize human conceptual reasoning. We instantiate a series of meta-learning tasks, and evaluate numerous baseline models on various aspects of compositional reasoning under uncertainty, including inferential coherence, boolean operation learning, counting, disentangling, etc. Further, we introduce the notion of a compositionality gap to quantify the difficultly of each generalization type, and to estimate the degree of compositionality in current deep learning models. We hope our contributions of dataset, compositionality gaps, evaluation metrics and baseline models help spur progress in the important research direction of productive concept learning. 

B.2 SAMPLING

We sample 2000000 initial hypotheses from the CFG G, and impose a maximum depth in the recursion tree of 6 when sampling. That is, no node has a depth larger than 6 in the recursion through which we generate concepts from the grammar G. We then reject and filter the hypotheses to obtain a set of "interesting" hypotheses H used in the main paper explained in more detail below: Rejection Sampling: We reject the following string combinations after sampling from the grammar G: • All programs which contain "λS. for-all x" and "S -x " in the same program. This is asking that for all objects in a scene, a certain property is satisfied by everything other than the object, which is the same as saying, for all objects in the scene. • All programs where we compare the properties of the same object to itself, e.g. color? (x) == color? (x), where color? can be any function applied to the object. • All programs where we have the following string: exists(color?(S) == color?(x)) or for-all(color?(S) == color?(x)) where color? can be any function applied to the object. • All programs which evaluate to true on schemas more than 10% of the time and less than 10 times. The former condition ensures that we work with concepts which are in some sense interesting and surprising (as opposed to concepts which are always trivially true), and the second condition ensures that we have unique schmeas or datapoints to place in the support and query sets, which both have 5 positive images each. We provide examples of concepts which get rejected for being true too often below: exists=x \in S or( =(locationX?( x ), locationY?( x ) ), any(color?( S ), brown ) ) exists=x \in S and( exists=(locationY?( S ), locationX?( x ) ), any(color?( S ), brown ) ) exists=x \in S or( all(color?( S ), gray ), all(color?( S ), brown ) ) See Appendix C for more details on the structured generalization splits which yeild train concepts H train and test concepts H test .

B.3 CONCEPT PRIOR WEIGHT w(h)

We next explain the form of the prior weight w(h) that we use for defining the prior over the concepts provided to the models (both oracles as well as deep learning models). Given l(h), the number of tokens in the postfix serialization of the concept h, the unnormalized weight w(h) is log-linear in the length, and is defined as follows: w(h) ∝ exp -0.2 • l(h) Given a split Ω ∈ {train, test}, the final, normalized weight is given as: w(h) = w(h) HΩ w(h) As explained in the main paper, the final prior for a hypothesis given a split Ω is p (h) = h ∈HΩ w(h)I[h = h ]. Our choice of the log-linear weight is inspired by the observation in cognitive science that longer boolean concepts are harder for people to learn (Feldman, 2000) . In order to create the perceptual inputs in the dataset U, we sample images using the renderer for CLEVR from Johnson et al. (2016) , changing the range of objects to [2, 5], to reduce clutter and enable easier learning of predicates like any and all for models. 3 The CLEVR renderer produces scenes u with pixels as well as an associated schema file u s detailing the properties of all the objects sampled in the scene, including their location, shape, size, material, and rotation. Based on this, we convert our sampled concepts into postfix notation and execute them on the schemas using an operator stack. Concretely, execution of the concept h ∈ H on u s yields a boolean true or false value {0, 1}. We execute each such hypothesis on a set of 990K images, yielding scores of how often a hypothesis is true for an image. We threshold this score to retain the subset of hypotheses which are true for no more than 10% of the images and are true at least for 10 images, to pick a subset of "interesting" hypotheses H for training models. Bias. The image dataset here sampled itself has a bias in terms of the location coordinates (in the pixel space). The CLEVR dataset generation process samples objects in the 3d (top-down x, y) coordinate space uniformly (from a grid of -3, to +3). However, since the camera is always looking into the scene from outside, the image formation geometry implies that in the camera / image coordinates most of the objects appear to be away from the scene and very few are close to the camera. Thus, in terms of the y-coordinates we observe in the image coordinates a bias in terms of the distribution not being unifrom. This also makes sense in general, as even in the real world, objects are not found very close to the camera or very far away from the camera in general. See Figure 12 for all the biases in the observation space u computed over 990K sampled images.

B.5 AUDIO.

To build the audio data, we use clips of orchestral instruments playing various pitches downloaded from https://philharmonia.co.uk/resources/sound-samples/. We make the following mappings of object properties: • x location → temporal location. larger x bin means the note is played later. • y location → pitch. All pitches between the instruments are the same (up to octaves). • color → instrument -gray → trumpet -red → clarinet -blue → violin -green → flute -brown → oboe -purple → saxaphone -cyan → french-horn -yellow → guitar • shape → amplitude profile; either getting louder, getting softer, or constant volume • size → total volume • material → low-pass filtering or no filtering. All binned quantities use the same number of bins as in the image domain. models while for schema o i ∈ R 96 . Further, the we use a learning rate of 1e-4 for image models, 1e-3 for schema models, and 5e-5 using the best learning rate for each modality across an initial sweep. The batch size for image and sound models is 8 episodes per batch, while for schema we use a batch size of 64. All models use the Adam optimizer. The overall scene representation across all the modalities is set to 256, that is, u ∈ R 256 . All our models are initialized with the method proposed in (Glorot & Bengio, 2010) , which we found to be crucial for training relation networks well. The initial representation from the first stage of the encoder (Fig. 4 in main paper) with the objects for images has a size of 10x8 i.e. there are 80 objects, while for sound representations have 38 objects. In the schema case the number of objects is the ground truth number of objects which is provided as input to the model. We trained all the models on the training set, and picked the best performing checkpoints on training -measured in terms of mAP-to report all the results in the main paper. Our image and schema models are trained for 1 million steps (16 epochs for images, 128 epochs for schemas) while the sound models are trained for 500K steps, and checkpoints are stored after every 30K steps. All our models fit on a GPU with 16GB capacity except the relation network trained with image inputs, which needs at 32GB GPU. We use the pytorch framework to implement all our models.

F MODEL ARCHITECTURES FOR POOLING

In this section we detail the exact architectures used for the different pooling operations we consider in this paper as shown in Figure 4 center panel. We first establish some notation. Let o i ∈ R K be the output object feature from the modality specific encoder (Figure 4 , left panel), and let us denote by O = {o i } |O| i=1 the set of features for each of the objects in the scene, which includes optional position information indicating where the object is present in the scene (Figure 4 ). Let N be the requested dimensionality of the feature space from the pooling operation. Given this, we can describe the pooling operations used as follows: • avg-pool: We first average the representations across all the objects {o i } |O| i=1 and then pass the averaged representation through an MLP with 256 x 512 x 384 x N units with batch normalization and rectified linear unit nonlinearity in the hidden layers. • concat: We first concatenate all the object representations in O, followed by an MLP with 256 x 512 x 256 x N units with batch normalization and rectified linear units nonlinearity in the hidden layers. • relation-net: For relation networks, following (Santoro et al., 2017) we use relative position encoding that captures the relative positioning of the objects in a scene for image and sound modalities, and use the location information already present in the schema modality. Based on this, in the terminology of Santoro et al. (2017) our g() MLP has 256 x 256 x 256 x 256 hidden units with rectified linear unit non linearity and batch normalization whereas our f () MLP has 256 x 256 x N units with recitifed linear unit non linearlity and batch normalization in the middle (non-output) layers. Different from the original paper, we do not use dropout as we did not observe any overfitting in our experiments. • transformer: We use a 2-head multi-head attention layer stacked 4 times, with the feedforward network dimenstions set to 512. After forwarding through this module, we take the output vectors o i for each object processed through these initial layers and pool across objects by doing max(), mean(), sum(), min() operations and concatenating their outputs, similar to previous work by Wang et al. (2019) . The final representation then does a linear projection of this concatenated vector to N , the dimensionality expected from the pooling module.

G.1 HYPERPARAMETER SWEEPS -OBJECT FEATURE DIMENSIONS

We next show the hyperparameter sweeps for image models in deterimining the choice of the dimensionality to represent each object o i for our image models (Figure 14 ). The same choice of In this section we provide the full results of all of the tested models on each of the splits considered in the paper, in the hard negatives setting. Tables 1 2 3 4 5 6 7 8 9 show the results of different models (sorted in a descending order based on mAP for each of the splits considered in the paper, in the case where models do not have access to language. G.8 DETAILED RESULTS ON ALL THE SPLITS IN EASY NEGATIVES SETTING. In this section we provide the full results of all of the tested models on each of the splits considered in the paper, in the easy negatives setting. Tables 10 11 12 13 14 15 16 17 18 show the results of different models (sorted in a descending order based on mAP for each of the splits considered in the paper, in the case where models do not have access to language. Note that we did not evaluate transformer models or sound models in this setting as this is qualitatively less interesting than the hard negatives setting and is not the main focus of the paper. 



While some strings h might be different in surface form, they may yeild the same results when applied to images. In this split we account for such synonomy, and ensure that no two concepts which are synonyms are in different splits. See Appendix B.6 for more details. Set to log-linear in the prefix serialization length of hypothesis, inspired by the observation that longer hypotheses are more difficult for humans(Feldman, 2000). See Appendix B.3 for more details. Since the chances of a constraint being true for all obejcts reduce exponentially as the number of objects increases.



r s g 3 B X 3 x 5 m b T O a r 5 X 8 + / O q / W r I o 4 S O k L H 6 B T 5 6 A L V 0 S 1 q o C a i y K B n 9 I r e n C f n x X l 3 P u a t K 0 4 x c 4 j + w P n 8 A a z G k 7 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " a 7 2 o 8 y6 R o o Q G v F S 2 q S J 5 T f o 1 6 Z k = " > A A A B + H i c b V B N S 8 N A E N 3 4 W e tH o x 6 9 L B b B U 0 l E 0 G N R D x 4 r 2 A 9 o Q 9 h s t + 3 S z S b s z o o 1 5 J d 4 8 a C I V 3 + K N / + N 2 z Y H b X 0 w 8 H h v h p l 5 U S q 4 B s / 7 d l Z W 1 9 Y 3 N k t b 5 e 2 d 3 b 2

r s g 3 B X 3 x 5 m b T O a r 5 X 8 + / O q / W r I o 4 S O k L H 6 B T 5 6 A L V 0 S 1 q o C a i y K B n 9 I r e n C f n x X l 3 P u a t K 0 4 x c 4 j + w P n 8 A a z G k 7 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " a 7 2 o 8 y6 R o o Q G v F S 2 q S J 5 T f o 1 6 Z k = " > A A A B + H i c b V B N S 8 N A E N 3 4 W e tH o x 6 9 L B b B U 0 l E 0 G N R D x 4 r 2 A 9 o Q 9 h s t + 3 S z S b s z o o 1 5 J d 4 8 a C I V 3 + K N / + N 2 z Y H b X 0 w 8 H h v h p l 5 U S q 4 B s / 7 d l Z W 1 9 Y 3 N k t b 5 e 2 d 3 b 2

r s g 3 B X 3 x 5 m b T O a r 5 X 8 + / O q / W r I o 4 S O k L H 6 B T 5 6 A L V 0 S 1 q o C a i y K B n 9 I r e n C f n x X l 3 P u a t K 0 4 x c 4 j + w P n 8 A a z G k 7 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " a 7 2 o 8 y6 R o o Q G v F S 2 q S J 5 T f o 1 6 Z k = " > A A A B + H i c b V B N S 8 N A E N 3 4 W e tH o x 6 9 L B b B U 0 l E 0 G N R D x 4 r 2 A 9 o Q 9 h s t + 3 S z S b s z o o 1 5 J d 4 8 a C I V 3 + K N / + N 2 z Y H b X 0 w 8 H h v h p l 5 U S q 4 B s / 7 d l Z W 1 9 Y 3 N k t b 5 e 2 d 3 b 2

w L e y v l l 0 w z j j a 7 u g 3 B + / 3 y X 9 L b a X l u y z v Z b b Y P x n H U y D r Z J N v E I 3 u k T Y 5 J h 3 Q J J 3 f k g T y R Z + f e e X R e n N e v 1 g l n P L N G f s B 5 + w T O F 5 p 5 < / l a t e x i t > y = 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 S V W R M 8 S u D r I G 8 F 9 e k f N 8 K e G g = "> A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y H A 4 / 3 Z p i Z F 6 a C a + O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X S S K Y Y + S 0 S i O i H V K L h E 3 3 A j s J M q p H E o s B 2 O 7 2 Z + + w m V 5 o l 8 N J M U g 5 g O J Y 8 4 o 8 Z K / o T c E K 9 f r b l 1 d w 6 y S r y C 1 K B A s 1 / 9 6 g 0 S l s U o D R N U 6 6 7 n p i b I q T K c C Z x W e p n G l L I x H W L X U k l j 1 E E + P 3 Z K z q w y I F G i b E l D 5 u r v i Z z G W k / i 0 H b G 1 I z 0 s j c T / / O 6 m Y m u g 5 z L N D M o 2 W J R l A l i E j L 7 n A y 4 Q m b E x B L K F L e 3 E j a i i j J j 8 6 n Y E L z l l 1 d J 6 6 L u u X X v 4 b L W u C 3 i K M M J n M I 5 e H A F D b i H J v j A g M M z v M K b I 5 0 X 5 9 3 5 W L S W n G L m G P 7 A + f w B h g 2 N 0 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 S V W R M 8 S u D r I G 8 F 9 e k f N 8 K e G g = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y H A 4 / 3 Z p i Z F 6 a C a + O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X S S K Y Y + S 0 S i O i H V K L h E 3 3 A j s J M q p H E o s B 2 O 7 2 Z + + w m V 5 o l 8 N J M U g 5 g O J Y 8 4 o 8 Z K / o T c E K 9 f r b l 1 d w 6 y S r y C 1 K B A s 1 / 9 6 g 0 S l s U o D R N U 6 6 7 n p i b I q T K c C Z x W e p n G l L I x H W L X U k l j 1 E E + P 3 Z K z q w y I F G i b E l D 5 u r v i Z z G W k / i 0 H b G 1 I z 0 s j c T / / O 6 m Y m u g 5 z L N D M o 2 W J R l A l i E j L 7 n A y 4 Q m b E x B L K F L e 3 E j a i i j J j 8 6 n Y E L z l l 1 d J 6 6 L u u X X v 4 b L W u C 3 i K M M J n M I 5 e H A F D b i H J v j A g M M z v M K b I 5 0 X 5 9 3 5 W L S W n G L m G P 7 A + f w B h g 2 N 0 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 S V W R M 8 S u D r I G 8 F 9 e k f N 8 K e G g = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y H A 4 / 3 Z p i Z F 6 a C a + O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X S S K Y Y + S 0 S i O i H V K L h E 3 3 A j s J M q p H E o s B 2 O 7 2 Z + + w m V 5 o l 8 N J M U g 5 g O J Y 8 4 o 8 Z K / o T c E K 9 f r b l 1 d w 6 y S r y C 1 K B A s 1 / 9 6 g 0 S l s U o D R N U 6 6 7 n p i b I q T K c C Z x W e p n G l L I x H W L X U k l j 1 E E + P 3 Z K z q w y I F G i b E l D 5 u r v i Z z G W k / i0 H b G 1 I z 0 s j c T / / O 6 m Y m u g 5 z L N D M o 2 W J R l A l i E j L 7 n A y 4 Q m b E x B L K F L e 3 E j a i i j J j 8 6 n Y E L z l l 1 d J 6 6 L u u X X v 4 b L W u C 3 i K M M J n M I 5 e H A F D b i H J v j A g M M z v M K b I 5 0 X 5 9 3 5 W L S W n G L m G P 7 A + f w B h g 2 N 0 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 S V W R M 8 S u D r I G 8 F 9 e k f N 8 K e G g = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 I t Q 9 O K x g m k L b S i b 7 a R d u t m E 3 Y 1 Q Q n + D F w + K e P U H e f P f u G 1 z 0 N Y H A 4 / 3 Z p i Z F 6 a C a + O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X S S K Y Y + S 0 S i O i H V K L h E 3 3 A j s J M q p H E o s B 2 O 7 2 Z + + w m V 5 o l 8 N J M U g 5 g O J Y 8 4 o 8 Z K / o T c E K 9 f r b l 1 d w 6 y S r y C 1 K B A s 1 / 9 6 g 0 S l s U o D R N U 6 6 7 n p i b I q T

Figure 2: CURI Task. Given as input a support set Dsupp, with positive and negative examples corresponding to concept, the model has to infer the concept and produce accurate predictions on novel images (right).

t e x i t s h a 1 _ b a s e 6 4 = " V l / a s 3 Q e m X u S R A d P D 3 / Z 6 D 5 q

s B g C M / w C m + O c F 6 c d + d j 3 l p y i p l D + A P n 8 w f Z f Y 1 / < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " V l / a s 3 Q e m X u S R A d P D 3 / Z 6 D 5 q

s B g C M / w C m + O c F 6 c d + d j 3 l p y i p l D + A P n 8 w f Z f Y 1 / < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " V l / a s 3 Q e m X u S R A d P D 3 / Z 6 D 5 q

s B g C M / w C m + O c F 6 c d + d j 3 l p y i p l D + A P n 8 w f Z f Y 1 / < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " V l / a s 3 Q e m X u S R A d P D 3 / Z 6 D 5 q

s B g C M / w C m + O c F 6 c d + d j 3 l p y i p l D + A P n 8 w f Z f Y 1 / < / l a t e x i t > Query L query < l a t e x i t s h a 1 _ b a s e 6 4 = " + 6 Q 7 4 T r Z h 4 X W Q 4 c q 4 P q n T w + / l e A = " > A A A C A 3 i c b V B N S 8 N A E N 3 4 W e t X 1 Z t e g k X w V B I R 9 F j 0 4 s F D B f s B T S m b 7 a R d u t n E 3 Y l Y Q s C L f 8 W L B 0 W 8 + i e 8 + W / c t D 1 o 6 4 O B x 3 s z z M z z Y 8 E 1 O s 6 3 t b C 4 t L y y W l g r r m 9 s b m 2 X d n Y b

a h k o a g u 6 k 4 x 8 y + 8 g o P T u I l C m J 9 l j 9 P Z H S U O t R 6 J v O / G I 9 6 + X i f 1 4 7 w e C 8 k 3

A A A C A 3 i c b V B N S 8 N A E N 3 4 W e t X 1 Z t e g k X w V B I R 9 F j 0 4 s F D B f s B T S m b 7 a R d u t n E 3 Y l Y Q s C L f 8 W L B 0 W 8 + i e 8 + W / c t D 1 o 6 4 O B x 3 s z z M z z Y 8 E 1 O s 6 3 t b C 4 t L y y W l g r r m 9 s b m 2 X d n Y b

a h k o a g u 6 k 4 x 8 y + 8 g o P T u I l C m J 9 l j 9 P Z H S U O t R 6 J v O / G I 9 6 + X i f 1 4 7 w e C 8 k 3

A A A C A 3 i c b V B N S 8 N A E N 3 4 W e t X 1 Z t e g k X w V B I R 9 F j 0 4 s F D B f s B T S m b 7 a R d u t n E 3 Y l Y Q s C L f 8 W L B 0 W 8 + i e 8 + W / c t D 1 o 6 4 O B x 3 s z z M z z Y 8 E 1 O s 6 3 t b C 4 t L y y W l g r r m 9 s b m 2 X d n Y b

a h k o a g u 6 k 4 x 8 y + 8 g o P T u I l C m J 9 l j 9 P Z H S U O t R 6 J v O / G I 9 6 + X i f 1 4 7 w e C 8 k 3

A A A C A 3 i c b V B N S 8 N A E N 3 4 W e t X 1 Z t e g k X w V B I R 9 F j 0 4 s F D B f s B T S m b 7 a R d u t n E 3 Y l Y Q s C L f 8 W L B 0 W 8 + i e 8 + W / c t D 1 o 6 4 O B x 3 s z z M z z Y 8 E 1 O s 6 3 t b C 4 t L y y W l g r r m 9 s b m 2 X d n Y b

a h k o a g u 6 k 4 x 8 y + 8 g o P T u I l C m J 9 l j 9 P Z H S U O t R 6 J v O / G I 9 6 + X i f 1 4 7 w e C 8 k 3I Z J w i S T R Y F i b A x s v N A 7 B 5 X w F C M D K F M c X O r z Q Z U U Y Y m t q I J w Z 1 9 e Z 4 0 T i q u U 3 F v T s v V i 2 k c B X J A D s k x c c k Z q Z Ir U i N 1 w s g j e S a v 5 M 1 6 s l 6 s d + t j 0 r p g T W f 2 y B 9 Y n z + q F p j P < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " a W O c H J r c b r s x J S L O 3 n 8 0 z 0 6 8 C G E = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j y 2 Y D + g D W W z n b R r N 5 u w u x F K 6 C / w 4 k E R r / 4 k b / 4 b t 2 0 O 2 v p g 4 P H e D D P z g k R w b V z 3 2 y m s r W 9 s b h W 3 S z u 7 e / s H 5 c O j l o 5 T x b D

6 t e 4 6 p S u 8 3 j K M I J n M I 5 e H A N N b i H O j S B A c I z v M K b 8 + i 8 O O / O x 6 K 1 4 O Q z x / A H z u c P c Y W M r w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " a W O c H J r c b r s x J S L O 3 n 8 0 z 0 6 8 C G E = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j y 2 Y D + g D W W z n b R r N 5 u w u x F K 6 C / w 4 k E R r / 4 k b / 4 b t 2 0 O 2 v p g 4 P H e D D P z g k R w b V z 3 2 y m s r W 9 s b h W 3 S z u 7 e / s H 5 c O j l o 5 T x b D

6 t e 4 6 p S u 8 3 j K M I J n M I 5 e H A N N b i H O j S B A c I z v M K b 8 + i 8 O O / O x 6 K 1 4 O Q z x / A H z u c P c Y W M r w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " a W O c H J r c b r s x J S L O 3 n 8 0 z 0 6 8 C G E = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j y 2 Y D + g D W W z n b R r N 5 u w u x F K 6 C / w 4 k E R r / 4 k b / 4 b t 2 0 O 2 v p g 4 P H e D D P z g k R w b V z 3 2 y m s r W 9 s b h W 3 S z u 7 e / s H 5 c O j l o 5 T x b D

6 t e 4 6 p S u 8 3 j K M I J n M I 5 e H A N N b i H O j S B A c I z v M K b 8 + i 8 O O / O x 6 K 1 4 O Q z x / A H z u c P c Y W M r w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " a W O c H J r c b r s x J S L O 3 n 8 0 z 0 6 8 C G E = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j y 2 Y D + g D W W z n b R r N 5 u w u x F K 6 C / w 4 k E R r / 4 k b / 4 b t 2 0 O 2 v p g 4 P H e D D P z g k R w b V z 3 2 y m s r W 9 s b h W 3 S z u 7 e / s H 5 c O j l o 5 T x b D

6 t e 4 6 p S u 8 3 j K M I J n M I 5 e H A N N b i H O j S B A c I z v M K b 8 + i 8 O O / O x 6 K 1 4 O Q z x / A H z u c P c Y W M r w = = < / l a t e x i t > L concept < l a t e x i t s h a 1 _ b a s e 6 4 = " q I e Q W X N r Z R v A B x 3 m x 2 r Z 9 s K 4 E 2 8 = " > A A A C B X i c b V A 9 S w N B E N 2 L X z F + R S 2 1 O A y C V b g T Q c u g j Y V F B P M B u R D 2 N n P J k r 2 9 Y 3 d O D M c 1 N v 4 V G w t F b P 0 P d v 4 b 9 5 I U m v h g 4 P H e D D P z / F h w j Y 7 z b R W W l l d W 1 4 r r p Y 3 N r e 2 d 8 u 5 e U 0 e J Y t B g k Y h U 2 6 c a B J f Q Q I 4 C 2 r E C G v o C W v 7 o K v d b 9 6 A 0 j + Q d j m P o h n Q g e c A Z R S P 1 y o d e S H H I q E h v s l 7 q I T x g y i L J I M Y s 6 5U r T t W Z w F 4 k 7 o x U y A z 1 X v n L 6 0 c s C U E i E 1 T r j u v E 2 E 2 p Q s 4 E Z C U v 0 R B T N q I D 6 B g q a Q i 6 m 0 6 + y O x j o / T t I F K m J N o T 9 f d E S k O t x 6 F v O v O b 9 b y X i / 9 5 n Q S D i 2 7 K Z Z w g S D Z d F C T C x s j O I 7 H 7 X A F D M T a E M s X N r T Y b U k U Z m u B K J g R 3 / u V F 0 j y t u k 7 V v T 2 r 1 C 5 n c R T J A T k i J 8 Q l 5 6 R G rk m d N A g j j + S Z v J I 3 6 8 l 6 s d 6 t j 2 l r w Z r N 7 J M / s D 5 / A B 3 9 m Z k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q I e Q WX N r Z R v A B x 3 m x 2 r Z 9 s K 4 E 2 8 = " > A A A C B X i c b V A 9 S w N B E N 2 L X z F + R S 2 1 O A y C V b g T Q c u g j Y V F B P M B u R D 2 N n P J k r 2 9 Y 3 d O D M c 1 N v 4 V Gw t F b P 0 P d v 4 b 9 5 I U m v h g 4 P H e D D P z / F h w j Y 7 z b R W W l l d W 1 4 r r p Y 3 N r e 2 d 8 u 5 e U 0 e J Y t B g k Y h U 2 6 c a B J f Q Q I 4 C 2 r E C G v o C W v 7 o K v d b 9 6 A 0 j + Q d j m P o h n Q g e c A Z R S P 1 y o d e S H H I q E h v s l 7 q I T x g y i L J I M Y s 6 5 U r T t W Z w F 4 k 7o x U y A z 1 X v n L 6 0 c s C U E i E 1 T r j u v E 2 E 2 p Q s 4 E Z C U v 0 R B T N q I D 6 B g q a Q i 6 m 0 6 + y O x j o / T t I F K m J N o T 9 f d E S k O t x 6 F v O v O b 9 b y X i / 9 5 n Q S D i 2 7 K Z Z w g S D Z d F C T C x s j O I 7 H 7 X A F D M T a E M s X N r T Y b U k U Z m u B K J g R 3 / u V F 0 j y t u k 7 V v T 2 r 1 C 5 n c R T J A T k i J 8 Q l 5 6 R G r k m d N A g j j + S Z v J I 3 6 8 l 6 s d 6 t j 2 l r w Z r N 7 J M / s D 5 / A B 3 9 m Z k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q I e Q W X N r Z R v A B x 3 m x 2 r Z 9 s K 4 E 2 8 = " > A A A C B X i c b V A 9 S w N B E N 2 L X z F + R S 2 1 O A y C V b g T Q c u g j Y V F B P M B u R D 2 N n P J k r 2 9 Y 3 d O D M c 1 N v 4 V G w t F b P 0 P d v 4 b 9 5 I U m v h g 4 P H e D D P z / F h w j Y 7 z b R W W l l dW 1 4 r r p Y 3 N r e 2 d 8 u 5 e U 0 e JY t B g k Y h U 2 6 c a B J f Q Q I 4 C 2 r E C G v o C W v 7 o K v d b 9 6 A 0 j + Q d j m P o h n Q g e c A Z R S P 1 y o d e S H H I q E h v s l 7 q I T x g y i L J I M Y s 6 5 U r T t W Z w F 4 k 7 o x U y A z 1 X v n L 6 0 c s C U E i E 1 T r j u v E 2 E 2 p Q s 4 E Z C U v 0 R B T N q I D 6 B g q a Q i 6 m 0 6 + y O x j o / T t I F K m J N o T 9 f d E S k O t x 6 F v O v O b 9 b y X i / 9 5 n Q S D i 2 7 K Z Z w g S D Z d F C T C x s j O I 7 H 7 X A F D M T a E M s X N r T Y b U k U Z m u B K J g R 3 / u V F 0 j y t u k 7 V v T 2 r 1 C 5 n c R T J A T k i J 8 Q l 5 6 R G rk m d N A g j j + S Z v J I 3 6 8 l 6 s d 6 t j 2 l r w Z r N 7 J M / s D 5 / A B 3 9 m Z k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q I e Q WX N r Z R v A B x 3 m x 2 r Z 9 s K 4 E 2 8 = " > A A A C B X i c b V A 9 S w N B E N 2 L X z F + R S 2 1 O A y C V b g T Q c u g j Y V F B P M B u R D 2 N n P J k r 2 9 Y 3 d O D M c 1 N v 4 V Gw t F b P 0 P d v 4 b 9 5 I U m v h g 4 P H e D D P z / F h w j Y 7 z b R W W l l d W 1 4 r r p Y 3 N r e 2 d 8 u 5 e U 0 e J Y t B g k Y h U 2 6 c a B J f Q Q I 4 C 2 r E C G v o C W v 7 o K v d b 9 6 A 0 j + Q d j m P o h n Q g e c A Z R S P 1 y o d e S H H I q E h v s l 7 q I T x g y i L J I M Y s 6 5 U r T t W Z w F 4 k 7o x U y A z 1 X v n L 6 0 c s C U E i E 1 T r j u v E 2 E 2 p Q s 4 E Z C U v 0 R B T N q I D 6 B g q a Q i 6 m 0 6 + y O x j o / T t I F K m J N o T 9 f d E S k O t x 6 F v O v O b 9 b y X i / 9 5 n Q S D i 2 7 K Z Z w g S D Z d F C T C x s j O I 7 H 7 X A F D M T a E M s X N r T Y b U k U Z m u B K J g R 3 / u V F 0 j y t u k 7 V v T 2 r 1 C 5 n c R T J A T k i J 8 Q l 5 6 R Gr k m d N A g j j + S Z v J I 3 6 8 l 6 s d 6 t j 2 l r w Z r N 7 J M / s D 5 / A B 3 9 m Z k = < / l a t e x i t > ↵ < l a t e x i t s h a 1 _ b a s e 6 4 = " J t D q a C S Y H d U s A r J l V i G Z O Y t H m 8 o = " > A A A B 7 X i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z A O S J f R O J s m Y 2 Z l l Z l Y I S / 7 B i w d F v P o / 3 v w b J 8 k e N L G g o a j q p r s r S g Q 3 1 v e / v c L a + s b m V n G 7 t L O 7 t 3 9Q P j x q G p V q y h p U C a X b E R o m u G Q N y 6 1 g 7 U Q z j C P B W t H 4 d u a 3 n p g 2 X M k H O 0 l Y G O N Q 8 g G n a J 3 U 7 K J I R t g r V / y q P w d Z J U F O K p C j 3 i t / d f u K p j G T l g o 0 p h P 4 i Q 0 z 1 J Z T w a a l b m p Y g n S M Q 9 Z x V G L M T J j Nr 5 2 S M 6 f 0 y U B p V 9 K S u f p 7 I s P Y m E k c u c 4 Y 7 c g s e z P x P 6 + T 2 s F 1 m H G Z p J Z J u l g 0 S A W x i s x e J 3 2 u G b V i 4 g h S z d 2 t h I 5 Q I 7 U u o J I L I V h + e Z U 0 L 6 q B X w 3 u L y u 1 m z y O I p z A K Z x D A F d Q g z u o Q w M o P M I z v M K b p 7 w X 7 9 3 7 W L Q W v H z m G P 7 A + / w B i 4 G P G A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J t D q a C S Y H d U s A r J l V i G Z O Y t H m 8 o = " > A A A B 7 X i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z A O S J f R O J s m Y 2 Z l l Z l Y I S / 7 B i w d F v P o / 3 v w b J 8 k e N L G g o a j q p r s r S g Q 3 1 v e / v c L a + s bm V n G 7 t L O 7 t 3 9 Q P j x q G p V q y h p U C a X b E R o m u G Q N y 6 1 g 7 U Q z j C P B W t H 4 d u a 3 n p g 2 X M k H O 0 l Y G O N Q 8 g G n a J 3 U 7 K J I R t g r V / y q P w d Z J U F O K p C j 3 i t / d f u K p j G T l g o 0 p h P 4 i Q 0 z 1 J Z T w a a l b m p Y g n S M Q 9 Z x V G L M T J j Nr 5 2 S M 6 f 0 y U B p V 9 K S u f p 7 I s P Y m E k c u c 4 Y 7 c g s e z P x P 6 + T 2 s F 1 m H G Z p J Z J u l g 0 S A W x i s x e J 3 2 u G b V i 4 g h S z d 2 t h I 5 Q I 7 U u o J I L I V h + e Z U 0 L 6 q B X w 3 u L y u 1 m z y O I p z A K Z x D A F d Q g z u o Q w M o P M I z v M K b p 7 w X 7 9 3 7 W L Q W v H z m G P 7 A + / w B i 4 G P G A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J t D q a C S Y H d U s A r J l V i G Z O Y t H m 8 o = " > A A A B 7 X i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z A O S J f R O J s m Y 2 Z l l Z l Y I S / 7 B i w d F v P o / 3 v w b J 8 k e N L G g o a j q p r s r S g Q 3 1 v e / v c L a + s b m V n G 7 t L O 7 t 3 9 Q P j x q G p V q y h p U C a X b E R o m u G Q N y 6 1 g 7 U Q z j C P B W t H 4 d u a 3 n p g 2 X M k H O 0 l Y G O N Q 8 g G n a J 3 U 7 K J I R t g r V / y q P w d Z J U F O K p C j 3 i t / d f u K p j G T l g o 0 p h P 4 i Q 0 z 1 J Z T w a a l b m p Y g n S M Q 9 Z x V G L M T J j N r 5 2 S M 6 f 0 y U B p V 9 K S u f p 7 I s P Y m E k c u c 4 Y 7 c g s e z P x P 6 + T 2 s F 1 m H G Z p J Z J u l g 0 S A W x i s x e J 3 2 u G b V i 4 g h S z d 2 t h I 5 Q I 7 U u o J I L I V h + e Z U 0 L 6 q B X w 3 u L y u 1 m z y O I p z A K Z x D A F d Q g z u o Q w M o P M I z v M K b p 7 w X 7 9 3 7 W L Q W v H z m G P 7 A + / w B i 4 G P G A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J t D q a C S Y H d U s A r J l V i G Z O Y t H m 8 o = " > A A A B 7 X i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z A O S J f R O J s m Y 2 Z l l Z l Y I S / 7 B i w d F v P o / 3 v w b J 8 k e N L G g o a j q p r s r S g Q 3 1 v e / v c L a + s b m V n G 7 t L O 7 t 3 9 Q P j x q G p V q y h p U C a X b E R o m u G Q N y 6 1 g 7 U Q z j C P B W t H 4 d u a 3 n p g 2 X M k H O 0 l Y G O N Q 8 g G n a J 3 U 7 K J I R t g r V / y q P w d Z J U F O K p C j 3 i t / d f u K p j G T l g o 0 p h P 4 i Q 0 z 1 J Z T w a a l b m p Y g n S M Q 9 Z x V G L M T J j N r 5 2 S M 6 f 0 y U B p V 9 K S u f p 7 I s P Y m E k c u c 4 Y 7 c g s e z P x P 6 + T 2 s F 1 m H G Z p J Z J u l g 0 S A W x i s x e J 3 2 u G b V i 4 g h S z d 2 t h I 5 Q I 7 U u o J I L I V h + e Z U 0 L 6 q B X w 3 u L y u 1 m z y O I p z A K Z x D A F d Q g z u o Q w M o P M I z v M K b p 7 w X 7 9 3 7 W L Q W v H z m G P 7 A + / w B i 4 G P G A = = < / l a t e x i t > t e x i t s h a _ b a s e = " T u

n o u d e / + s t a K e o o w w m c w j l c A U N u I M m t I C B g m d h T f H O C / O u / O x G C x c x / I H z + Q P q p E Z < / l a t e x i t > f (u) < l a t e x i t s h a 1 _ b a s e 6 4 = " x R p H G 7 e G m 1 t L n g / z h B L Z 5 y p w R / Y = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B H q p s y I o M u i G 5 c V 7 A P a o W T S O 2 1 o J j M m m U I Z + h 1 u X C j i 1 o 9 x 5 9 + Y a b v Q 1 g O B w z n 3 c k 9 O k A i u j e t + O 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 q e N U M W y w W

Figure 4: Baseline models (left). Different choices for the encoder f (u) parameterization explored in the paper.We consider three modalities each of which is processed with a modality specific encoder, followed by four kinds of pooling architecture which take as input objects and their corresponding locations to provide an encoding for the datapoint. Training (right). The model is trained by processing the support images Dsupp with positive (green) and negative (red) images, using f (x) to compute Lquery which computes generalization error on queries and Lconcept which learns to decode the true concept as an auxiliary task. Losses are weighted by α ≥ 0.

Figure 6: Qualitative Example of an Episode in CURI dataset. Best viewed zooming in, in color.

Figure 7: Qualitative Example of an Episode in CURI dataset. Best viewed zooming in, in color.

Figure 8: Qualitative Example of an Episode in CURI dataset. Best viewed zooming in, in color.

Figure 9: Qualitative Example of an Episode in CURI dataset. Best viewed zooming in, in color.

Figure 10: Qualitative Example of an Episode in CURI dataset. Best viewed zooming in, in color.

Figure 11: Qualitative Example of an Episode in CURI dataset. Best viewed zooming in, in color.

Here are some examples of hypotheses with a high weight (computed on Ω = train ∪ test): e x i s t s x i n S = ( 2 , c o u n t = ( c o l o r ? ( $S_{-x } $ ) , c y a n ) ) , e x i s t s x i n S >( l o c a t i o n Y ? ( x ) , 6 ) , = ( c o u n t = ( c o l o r ? ( S ) , brown ) , 3 ) , >( c o u n t = ( l o c a t i o n X ? ( S ) , 3 ) , 2 ) , any ( l o c a t i o n Y ? ( S ) , 6 ) , = ( 1 , c o u n t = ( l o c a t i o n Y ? ( S ) , 7 ) ) , = ( 3 , c o u n t = ( l o c a t i o n Y ? ( S ) , 3 ) ) , a l l ( l o c a t i o n X ? ( S ) , 2 ) , e x i s t s x i n S a l l ( l o c a t i o n Y ? ( $S_{-x } $ ) , 5 ) , = ( 2 , c o u n t = ( c o l o r ? ( S ) , b l u e ) ) , f o ra l l x i n S n o t ( > ( 6 , l o c a t i o n X ? ( x ) ) ) , = ( c o u n t = ( c o l o r ? ( S ) , g r a y ) , 2 ) , = ( 2 , c o u n t = ( c o l o r ? ( S ) , g r a y ) ) B.4 EXECUTION ON IMAGES.

Figure 16: mAP on validation for hard negatives (y-axis) vs number of training steps (x-axis) for relation network models on images with different amounts of language usage by varying the parameter α.

Figure 17: mAP on validation for hard negatives (y-axis) vs number of training steps (x-axis) for relation network models on schemas with different amounts of language usage by varying the parameter α.

Figure 18: mAP on validation for hard negatives (y-axis) vs number of training steps (x-axis) for concat pooling models on schemas with different amounts of language usage by varying the parameter α.

e x i t >

Language of thought. All valid (type-consistent) compositions of functions are potential complex concepts in our dataset. Note that the functions are illustrated for the case of images and schemas. Location, size, shape etc. correspond to different properties for sounds. relation-network, global average pooling, concatenation), 3) whether or not training provides groundtruth symbolic descriptions of concepts, and 4) how negative examples are sampled. Overall, our evaluations suggest that there is substantial room for improvement in compositional reasoning under uncertainty, w.r.t the compositionality gap, representing a novel challenge for compositional learning.

Concept IID: Evaluates generalization to novel concepts based on an arbitrary random split of the concepts into H train and H test . 1 • Counting: Evaluates the ability to learn a new concept h with novel property-count combinations, e.g, the training concepts never filter for exactly '3 squares'. • Extrinsic properties: Evaluates the ability to learn a new concept h, with novel combinations of extrinsic (e.g. location) and intrinsic (e.g. color) object properties. • Intrinsic properties: Evaluates the ability to learn a new concept h with novel combinations of intrinsic properties, e.g., the training concepts never reference both 'red' and 'rubber'. • Boolean operations: Evaluates the ability to learn concepts which require application of a familiar boolean operation to a property to which the operation has never been applied previously. • Complexity split: Evaluates generalization from simple concepts (those which have less than

to parameterize p(h|D supp ) = Π S s=1 p(h s |h 1•••t-1 ; D supp ). At each step of the LSTM we concatenate [c p , c n ] to the input. Compositionality Gap. Different splits (x-axis) plotted w.r.t performance of the strong oracle (green line) and weak oracle (red line) on the mAP (top) and Accuracy (bottom) evaluated on respective test splits (using hard negatives in support and query sets). Difference between the two is the compositionality gap (comp gap). Yellow: shows the (best) relation-net model on schema inputs, purple: shows the model on image inputs, and gray: shows the model on sound inputs. Error bars are std across 3 independent model runs.

Performance on meta-test, sorted based on mAP (in %) on Binding (color) with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Boolean with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Intrinsic with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Concept IID with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Instance IID with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Extrinsic with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Complexity with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Binding (shape) with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Counting with hard negatives

Performance on meta-test, sorted based on mAP (in %) on Binding (color) with easy negatives

Performance on meta-test, sorted based on mAP (in %) on Boolean with easy negatives

Performance on meta-test, sorted based on mAP (in %) on Counting with easy negatives

Performance on meta-test, sorted based on mAP (in %) on Extrinsic with easy negatives

Performance on meta-test, sorted based on mAP (in %) on Intrinsic with easy negatives

Performance on meta-test, sorted based on mAP (in %) on Concept IID with easy negatives

Performance on meta-test, sorted based on mAP (in %) on Instance IID with easy negatives

Performance on meta-test, sorted based on mAP (in %) on Complexity with easy negatives

Performance on meta-test, sorted based on mAP (in %) on Binding (shape) with easy negatives

A EXAMPLE EPISODES FROM THE DATASET

We show examples from the Concept IID split test set comprising the ground truth productive concept (top), along with the support and query sets for meta learning (rendered as images), the alternate hypotheses which are consistent with the support set -that is, other hypotheses which could also have generated the positive and negative examples in the support set -and the concepts based on which we pick the hard negatives Figures 6 to 11.

B ADDITIONAL DATASET DETAILS

We first provide more details of the concept space G, then explain how we obtain H, the space of concepts for training and evaluation, provide more details of the structured splits, and finally explain the weight w(h) based on which we sample concepts.

B.1 MORE DETAILS OF THE GRAMMAR

We provide below the full grammar used to sample concepts, where A → B|C means that A can expand to either B or C under the rules defined by the grammar. We always start expanding at the START token and then follow the rules of the grammar until we hit a terminal node (which does not have any expansions defined). As and where possible, we followed the insights from Piantadosi et al. (2016) in choosing the sampling probabilities for various completions based on how well humans seem to be able to learn the corresponding primitive. For example, we sample utterances with disjunctions (or) less frequently since they are known to be difficult for humans to learn. Based on Kemp & Jern (2009) , we chose to represent location as a discrete entity, such that relative, and categorical notions of left or right simply become comparisons in the location space (location? x > location? S -x ), unlike the CLEVR dataset (Johnson et al., 2016) which defines categorical relational objects.Here is the full grammar G used for sampling the concepts (as explained in the main paper, S -x = S/{x}). Note that the grammar always generates strings in postfix notation and thus the operands in each expansion occur before the operation: We next show an analysis of concepts which have the same evaluation signatures on a large set of 990K images, and are thus synonymous (in context of the dataset at hand). Note that while some of these concepts might be truly synonymous to each other (for example, A > B is the same as B < A), others might be synonymous in context of the image distribution we work with. For example, size can never be greater than 0.7 in our dataset and location can never be greater than 8, and thus asking if location is greater than 8 or size is greater than 0.7 has the same semantics on our dataset. In Figure 13 we show each such "concept" or "meaning", which is a cluster of hypotheses which all evaluate to the same truth value and plot a histogram of how many hypotheses each cluster tends to have. We notice that most of the concepts have 1 synonym (i.e. there is only one concept with the particular) evaluation signature, with a long tail going upto 80 synonyms in a concept. In the Concept IID split we ensure that none of the concepts which have the same signature are found across the train/val/test splits.

C DETAILED DISCUSSION OF THE STRUCTURED SPLITS

We provide more details on how each of the structured splits described in Sec. 3 of the main paper are created. Assuming access to H, the space of concepts sampled and filtered from the grammar G, we use various heuristics to produce the generalization splits in the paper:• Concept IID: This split divides concepts into train and test by picking concepts at random from H and assigning them to H train or H test while ensuring that no two concepts which are synonyms Appendix B.6 are found in different splits.• Boolean: This split forms cross product of all possible colors and {and, or} boolean operators, and partitions a subset of such combinations which we want to only occur in test.We use the following tokens for test: We then create H test to contain all concepts which have any of the combinations above.For example, if a concept has both green and or we would place it in H test . After every feasible candidate is placed in H test based on this heurisitc, the remaining concepts in H are assigned to H train .• Extrinsic: This split forms cross product of all possible colors and locations in the dataset, and partitions a subset of such combinations that we only want to occur in test. We use the following tokens for test (only a subset shown for illustration): We then create H test to contain all concepts which have any of the combinations above. For example, if a concept has both gray and 7, and is related to location, that is contains locationX? or locationY? keywords, we would place it in H test . After every feasible candidate is placed in H test based on this heurisitc, the remaining concepts in H are assigned to H train .• Intrinsic: This split forms cross product of all possible colors and materials in the dataset, and partitions a subset of such combinations that we only want to occur in test. We use the following tokens for test:'green', 'metal' | 'purple', 'rubber' | 'cyan', 'rubber' | 'red', 'metal' | 'green', 'rubber'We then create H test to contain all concepts which have any of the combinations above.For example, if a concept has both green and metal, and is related to material, that is contains material? keyword, we would place it in H test . After every feasible candidate is placed in H test based on this heurisitc, the remaining concepts in H are assigned to H train .• Binding (color): This split takes all possible colors in the dataset, and partitions a subset of colors that we only want to occur in test. We use the following tokens for test:We then create H test to contain all concepts which have any of the tokens above. For example, if a concept has purple, we would place it in H test . After every feasible candidate is placed in H test based on this heurisitc, the remaining concepts in H are assigned to H train . • Binding (shape): This split takes all possible shapes in the dataset, and partitions a subset of shapes that we only want to occur in test. We use the following tokens for test:'cylinder'We then create H test to contain all concepts which have any of the tokens above. 

D CREATING SUPPORT AND QUERY SETS

We next explain how we go from the initial dataset U -which contains a large number of images, schema and sounds -and a concept space H train and H test , to a dataset for meta learning. To create the training/validation/test sets for models, we sample a series of episodes, each containing a support set and a query set. We illustrate the sampling procedure for a training episode below:Support Set Sampling with Hard Negatives 1. Pick a concept h ∼ p train (h), with a preference for shorter hypotheses being more frequent based on the weights used to define the prior Appendix B.3 2. Pick 5 images (P ), uniformly at random from U such that h(u s ) = 1, where the concept is evaluated on the schema to determine the label Appendix B.4 3. Identify other concepts h ∈ H s.t. h(u(S)) = 1 and h = h 4. Pick images such that h (u s ) = 1 and h(u s ) = 0 as negatives (N ). If no such images exist, pick random images from U as negatives until we have 20 negatives. 5. Return D supp = P ∪ N .The sampling procedure for the Query set iterates all the steps above (except step 1, where we choose the concept h).Step 3 and 4 outline a procedure for identifying hard negatives for training the model, by looking at other hypotheses which also explain a chosen set of positives P and using them to clarify what the concept of interest is.We give below an analogous procedure for easy negatives: Support Set Sampling with Easy Negatives 1. Pick a concept h ∼ p train (h), with a preference for shorter hypotheses being more frequent based on the weights used to define the prior Appendix B.3 2. Pick 5 images (P ), uniformly at random from U such that h(u s ) = 1, where the concept is evaluated on the schema to determine the label Appendix B.4 3. Pick 20 random images from U as negatives, N. 4. Return D supp = P ∪ N . Similar to hard negatives, the sampling procedure for the Query set iterates all the steps above (except step 1, where we choose the concept h).

E REPRODUCIBILITY AND HYPERPARAMETERS

For all the models in Figure . 4 in the main paper, we use the following hyperparameters. All the modalities are processed into a set of objects {o i } N i=1 where each o i ∈ R 64 for image and sound 

G.2 IMAGE RELATION NETWORKS LEARNING RATE SWEEPS

We picked the learning rate for image models based on the best performing image relation network model, which across an initial sweep we found to yeild the best class of models. Figure 15 shows the performance of the models across learning rates of {1e-4, 5e-4, 2.5e-4}.

G.3 SWEEP ON USE OF LANGUAGE

As explained in the main paper (Figure. 4 ), the parameter α controls the tradeoff between the query accuracy and the likelihood of the concept expressed as a prefix string. We generally found across a broad range of values in {0.0, 0.01, 0.10, 1.0} the models generally performed the best at α = 1.0. Our initial experiments with α = 10.0 suggested substantially worse performance so we discareded it from the sweep. See Figures 16 to 18 for the corresponding results. 

G.4 RESULTS ON EASY NEGATIVES

In Figure 19 we show results for the relation-net model on various splits, where easy negatives are used to populate the support and query sets during training and evaluation, unlike the case of hard negatives discussed in the main paper (Figure. 5) . Notice that the compositionality gap (comp gap) is lower in general for easy negatives compared to the hard negatives as reported in the main paper. Further, we find that the best models are substantially closer to the strong oracle compared to Figure.5 main paper, showing that on the easier, less compositional task it is easier for machine learning models to approach the strong oracle (especially in terms of accuracy). Finally, it is interesting to note that with easy negatives it appears that the best models outperform the weak oracle on the Counting split, while with the hard negatives one finds that the models are worse than the weak oracle, suggesting poor generalization for counting.G.5 FINER α SWEEP FOR COUNTING Finally, we ran a finer alpha sweep for the Counting split since it appeared on our initial sweep that the counting split was not performing better with language. Concretely, we ran a new set of experiments sweeping over α values of {0.01, 0.10, 1.0, 5.0, 10.0, 100.0}. Across this broader range of values, we found models still did not show any statistically significant gains from using language v.s. not for the Counting split.G.6 CHOICE OF METRIC: MAP v.s. ACCURACYIn general, the mAP metric opens up a larger comp gap for the various splits than indicated by CBA. For example, with hard negatives, while CBA indicates a gap of 14.2% for Counting compared to 0% for Instance IID, mAP suggests a gap of 34.4% for Counting relative to 0% for Instance IID.For the Binding (color) split its 86.5% comp gap (mAP) v.s. 34.0% for CBA. mAP, while being more expensive to compute evaluates more thoroughly to test if a concept h is truly learnt by the model, by probing its performance on a large, representative set of negatives T , providing a more stringent test of compositional generalization.

