HUMANLY CERTIFYING SUPERHUMAN CLASSIFIERS

Abstract

This paper addresses a key question in current machine learning research: if we believe that a model's predictions might be better than those given by human experts, how can we (humans) verify these beliefs? In some cases, this "superhuman" performance is readily demonstrated; for example by defeating top-tier human players in traditional two player games. On the other hand, it can be challenging to evaluate classification models that potentially surpass human performance. Indeed, human annotations are often treated as a ground truth, which implicitly assumes the superiority of the human over any models trained on human annotations. In reality, human annotators are subjective and can make mistakes. Evaluating the performance with respect to a genuine oracle is more objective and reliable, even when querying the oracle is more expensive or sometimes impossible. In this paper, we first raise the challenge of evaluating the performance of both humans and models with respect to an oracle which is unobserved. We develop a theory for estimating the accuracy compared to the oracle, using only imperfect human annotations for reference. Our analysis provides an executable recipe for detecting and certifying superhuman performance in this setting, which we believe will assist in understanding the stage of current research on classification. We validate the convergence of the bounds and the assumptions of our theory on carefully designed toy experiments with known oracles. Moreover, we demonstrate the utility of our theory by meta-analyzing large-scale natural language processing tasks, for which an oracle does not exist, and show that under our mild assumptions a number of models from recent years have already achieved superhuman performance with high probability-suggesting that our new oracle based performance evaluation metrics are overdue as an alternative to the widely used accuracy metrics that are naively based on imperfect human annotations.

1. INTRODUCTION

Artificial Intelligence (AI) agents have begun to outperform humans on remarkably challenging tasks; AlphaGo defeated top ranked Go players (Silver et al., 2016; Singh et al., 2017) , and Ope-nAI's Dota2 AI has defeated human world champions of the game (Berner et al., 2019) . These AI tasks may be evaluated objectively, e.g., using the total score achieved in a game and the victory against another player. However, for supervised learning tasks such as image classification and sentiment analysis, certifying a machine learning model as superhuman is subjectively tied to human judgments rather than comparing with an oracle. We focus on paving a way towards evaluating models with potentially superhuman performance in classification. When evaluating the performance of a classification model, we generally rely on the accuracy of the predicted labels with regard to ground truth labels, which we call the oracle accuracy. However, or-acle labels may arguably be unobservable. For tasks such as object detection and saliency detection, the predictions are subjective to many factors of the annotators, e.g., their background and physical or mental state. For other tasks, even experts may not be able to summarize an explicit rule for the prediction, such as predicting molecule toxicity and stability. Without observing oracle labels researchers often resort to two heuristics, i) human predictions or aggregated human annotations are effectively treated as ground truth (Wang et al., 2018; Lin et al., 2014; Wang et al., 2019) to approximate the oracle, and ii) the inter-annotator aggreement is taken as the best possible machine learning model performance (for an extensive survey of works that make this claim without proof, see the works cited within (Boguslav & Cohen, 2017; Richie et al., 2022)) . This heuristic approach suffers some key disadvantages. Firstly, the quality control of human annotation is challenging (Artstein, 2017; Lampert et al., 2016) . Secondly, current evaluation paradigms focus on evaluating the performance of models, but not the oracle accuracy of humans -yet we cannot claim that a machine learning model is superhuman without properly estimating the human performance as compared to the oracle. Thirdly, as machine learning models exceed human performance on important tasks, it becomes insufficient to merely report the agreement of the model to human annotations. < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " h P + 6 L r U f 2 d 3 t Z a l d q a Q Q v E K M X y w = " > A A A B 2 X i c b Z D N S g M x F I X v 1 L 8 6 V q 1 r N 8 E i u C o z b n Q p u H F Z w b Z C O 5 R M 5 k 4 b m s k M y R 2 h D H 0 B F 2 5 E f C 9 3 v o 3 p z 0 J b D w Q + z k n I v S c u l L Q U B N 9 e b W d 3 b / + g f u g f N f z j k 9 N m o 2 f z 0 g j s i l z l 5 j n m F p X U 2 C V J C p 8 L g z y L F f b j 6 f 0 i 7 7 + g s T L X T z Q r M M r 4 W M t U C k 7 O 6 o y a r a A d L M W 2 I V x D C 9 Y a N b + G S S 7 K D D U J x a 0 d h E F B U c U N S a F w 7 g 9 L i w U X U z 7 G g U P N M 7 R R t R x z z i 6 d k 7 A 0 N + 5 o Y k v 3 9 4 u K Z 9 b O s t j d z D h N 7 G a 2 M P / L B i W l t 1 E l d V E S a r H 6 K C 0 V o 5 w t d m a J N C h I z R x w Y a S b l Y k J N 1 y Q a 8 Z 3 H Y S b G 2 9 D 7 7 o d B u 3 w M Y A 6 n M M F X E E I N 3 A H D 9 C B L g h I 4 B X e v Y n 3 5 n 2 s u q p 5 6 9 L O 4 I + 8 z x 8 4 x I o 4 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p l m + c 0 m U 0 4 m e 4 4 K q D / 2 E 1 x a Y e b 4 = " > A A A B 9 n i c b V B N S 8 N A E J 3 U r 1 q r R k + C l 8 U i 1 E t J v O h F E L x 4 r G A / o A l h s 9 2 0 a z e b s L s R S q g X / 4 o X D 4 r 4 U 7 z 5 b 9 y k P W j r g 2 U e 7 8 2 w M y 9 M O V P a c b 6 t y t r 6 x u Z W d b u 2 U 9 / d 2 7 c P 6 l 2 V Z J L Q D k l 4 I v s h V p Q z Q T u a a U 7 7 q a Q 4 D j n t h Z O b w u 8 9 U q l Y I u 7 1 N K V + j E e C R Y x g b a T A P v J i r M d h i N p N j 3 I e s K u y P J w F d s N p O S X Q K n E X p A E L t A P 7 y x s m J I u p 0 I R j p Q a u k 2 o / x 1 I z w u m s 5 m W K p p h M 8 I g O D B U 4 p s r P y w t m 6 N Q o Q x Q l 0 j y h U a n + n s h x r N Q 0 D k 1 n s a 9 a 9 g r x P 2 + Q 6 e j S z 5 l I M 0 0 F m X 8 U Z R z p B B V x o C G T l G g + N Q Q T y c y u i I y x x E S b 0 G o m B H f 5 5 F X S P W + 5 T s u 9 c 6 A K x 3 A C T X D h A q 7 h F t r Q A Q J P 8 A J v 8 G 4 9 W 6 / W x z y u i r X I 7 R D + w P r 8 A d 2 x l L o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p l m + c 0 m U 0 4 m e 4 4 K q D / 2 E 1 x a Y e b 4 = " > A A A B 9 n i c b V B N S 8 N A E J 3 U r 1 q r R k + C l 8 U i 1 E t J v O h F E L x 4 r G A / o A l h s 9 2 0 a z e b s L s R S q g X / 4 o X D 4 r 4 U 7 z 5 b 9 y k P W j r g 2 U e 7 8 2 w M y 9 M O V P a c b 6 t y t r 6 x u Z W d b u 2 U 9 / d 2 7 c P 6 l 2 V Z J L Q D k l 4 I v s h V p Q z Q T u a a U 7 7 q a Q 4 D j n t h Z O b w u 8 9 U q l Y I u 7 1 N K V + j E e C R Y x g b a T A P v J i r M d h i N p N j 3 I e s K u y P J w F d s N p O S X Q K n E X p A E L t A P 7 y x s m J I u p 0 I R j p Q a u k 2 o / x 1 I z w u m s 5 m W K p p h M 8 I g O D B U 4 p s r P y w t m 6 N Q o Q x Q l 0 j y h U a n + n s h x r N Q 0 D k 1 n s a 9 a 9 g r x P 2 + Q 6 e j S z 5 l I M 0 0 F m X 8 U Z R z p B B V x o C G T l G g + N Q Q T y c y u i I y x x E S b 0 G o m B H f 5 5 F X S P W + 5 T s u 9 c 6 A K x 3 A C T X D h A q 7 h F t r Q A Q J P 8 A J v 8 G 4 9 W 6 / W x z y u i r X I 7 R D + w P r 8 A d 2 x l L o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " U N i T d B M d Y J h F H 7 T o t r + z Q P l U A q Y = " > A A A C A X i c b V D L S s N A F J 3 4 r P U V d S O 4 G S x C 3 Z T E j W 6 E o h u X F e w D m h A m 0 5 t 2 7 O T B z E Q o o W 7 8 F T c u F H H r X 7 j z b 5 y k W W j r g W E O 5 9 z L v f f 4 C W d S W d a 3 s b S 8 s r q 2 X t m o b m 5 t 7 + y a e / s d G a e C Q p v G P B Y 9 n 0 j g L I K 2 Y o p D L x F A Q p 9 D 1 x 9 f 5 3 7 3 A Y R k c X S n J g m 4 I R l G L G C U K C 1 5 5 q E T E j X y f d y q O 8 C 5 x y 6 L 7 / 7 U M 2 t W w y q A F 4 l d k h o q 0 f L M L 2 c Q 0 z S E S F F O p O z b V q L c j A j F K I d p 1 U k l J I S O y R D 6 m k Y k B O l m x Q V T f K K V A Q 5 i o V + k c K H + 7 s h I K O U k 9 H V l v q + c 9 3 L x P 6 + f q u D C z V i U p A o i O h s U p B y r G O d x 4 A E T Q B W f a E K o Y H p X T E d E E K p 0 a F U d g j 1 / 8 i L p n D V s q 2 H f W r X m V R l H B R 2 h Y 1 R H N j p H T X S D W q i N K H p E z + g V v R l P x o v x b n z M S p e M s u c A / Y H x + Q N R d p Y i < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S W L Y R 3 x F W h + K G D m T 3 i c U 5 r s X 3 s 8 = " > A A A C A X i c b V D L S s N A F J 3 U V 6 2 v q B v B z W A R 6 q Y k I u h G K L p x W c E + o A l h M r 1 p x 0 4 e z E y E E u r G X 3 H j Q h G 3 / o U 7 / 8 Z J m 4 W 2 H h j m c M 6 9 3 H u P n 3 A m l W V 9 G 6 W l 5 Z X V t f J 6 Z W N z a 3 v H 3 N 1 r y z g V F F o 0 5 r H o + k Q C Z x G 0 F F M c u o k A E v o c O v 7 o O v c 7 D y A k i 6 M 7 N U 7 A D c k g Y g G j R G n J M w + c k K i h 7 + N m z Q H O P X Y 5 / e 5 P P L N q 1 a 0 p 8 C K x C 1 J F B Z q e + e X 0 Y 5 q G E C n K i Z Q 9 2 0 q U m x G h G O U w q T i p h I T Q E R l A T 9 O I h C D d b H r B B B 9 r p Y + D W O g X K T x V f 3 d k J J R y H P q 6 M t 9 X z n u 5 + J / X S 1 V w 4 W Y s S l I F E Z 0 N C l K O V Y z z O H C f C a C K j z U h V D C 9 K 6 Z D I g h V O r S K D s G e P 3 m R t E / r t l W 3 b 8 + q j a s i j j I 6 R E e o h m x 0 j h r o B j V R C 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m J W W j K J n H / 2 B 8 f k D U r a W J g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T u J P U h o p h 8 x / 5 9 p i 4 / i 7 e X E U k S s = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h X k o i g h 6 L X j x W t B / Q h r L Z T t q l m 0 3 Y 3 Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K 4 N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R S 8 e p Y t h k s Y h V J 6 A a B Z f Y N N w I 7 C Q K a R Q I b A f j 2 5 n f f k K l e S w f z S R B P 6 J D y U P O q L H S Q 5 W f 9 8 s V t + b O Q V a J l 5 M K 5 G j 0 y 1 + 9 Q c z S C K V h g m r d 9 d z E + B l V h j O B 0 1 I v 1 Z h Q N q Z D 7 F o q a Y T a z + a n T s m Z V Q Y k j J U t a c h c / T 2 R 0 U j r S R T Y z o i a k V 7 2 Z u J / X j c 1 4 b W f c Z m k B i V b L A p T Q U x M Z n + T A V f I j J h Y Q p n i 9 l b C R l R R Z m w 6 J R u C t / z y K m l d 1 D y 3 5 t 1 f V u o 3 e R x F O I F T q I I H V 1 C H O 2 h A E x g M 4 R l e 4 c 0 R z o v z 7 n w s W g t O P n M M f + B 8 / g C U 0 4 1 S < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T u J P U h o p h 8 x / 5 9 p i 4 / i 7 e X E U k S s = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h X k o i g h 6 L X j x W t B / Q h r L Z T t q l m 0 3 Y 3 Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K 4 N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R S 8 e p Y t h k s Y h V J 6 A a B Z f Y N N w I 7 C Q K a R Q I b A f j 2 5 n f f k K l e S w f z S R B P 6 J D y U P O q L H S Q 5 W f 9 8 s V t + b O Q V a J l 5 M K 5 G j 0 y 1 + 9 Q c z S C K V h g m r d 9 d z E + B l V h j O B 0 1 I v 1 Z h Q N q Z D 7 F o q a Y T a z + a n T s m Z V Q Y k j J U t a c h c / T 2 R 0 U j r S R T Y z o i a k V 7 2 Z u J / X j c 1 4 b W f c Z m k B i V b L A p T Q U x M Z n + T A V f I j J h Y Q p n i 9 l b C R l R R Z m w 6 J R u C t / z y K m l d 1 D y 3 5 t 1 f V u o 3 e R x F O I F T q I I H V 1 C H O 2 h A E x g M 4 R l e 4 c 0 R z o v z 7 n w s W g t O P n M M f + B 8 / g C U 0 4 1 S < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T u J P U h o p h 8 x / 5 9 p i 4 / i 7 e X E U k S s = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h X k o i g h 6 L X j x W t B / Q h r L Z T t q l m 0 3 Y 3 Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K 4 N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R S 8 e p Y t h k s Y h V J 6 A a B Z f Y N N w I 7 C Q K a R Q I b A f j 2 5 n f f k K l e S w f z S R B P 6 J D y U P O q L H S Q 5 W f 9 8 s V t + b O Q V a J l 5 M K 5 G j 0 y 1 + 9 Q c z S C K V h g m r d 9 d z E + B l V h j O B 0 1 I v 1 Z h Q N q Z D 7 F o q a Y T a z + a n T s m Z V Q Y k j J U t a c h c / T 2 R 0 U j r S R T Y z o i a k V 7 2 Z u J / X j c 1 4 b W f c Z m k B i V b L A p T Q U x M Z n + T A V f I j J h Y Q p n i 9 l b C R l R R Z m w 6 J R u C t / z y K m l d 1 D y 3 5 t 1 f V u o 3 e R x F O I F T q I I H V 1 C H O 2 h A E x g M 4 R l e 4 c 0 R z o v z 7 n w s W g t O P n M M f + B 8 / g C U 0 4 1 S < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " T u J P U h o p h 8 x / 5 9 p i 4 / i 7 e X E U k S s = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h X k o i g h 6 L X j x W t B / Q h r L Z T t q l m 0 3 Y 3 Q g l 9 C d 4 8 a C I V 3 + R N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S K 4 N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R S 8 e p Y t h k s Y h V J 6 A a B Z f Y N N w I 7 C Q K a R Q I b A f j 2 5 n f f k K l e S w f z S R B P 6 J D y U P O q L H S Q 5 W f 9 8 s V t + b O Q V a J l 5 M K 5 G j 0 y 1 + 9 Q c z S C K V h g m r d 9 d z E + B l V h j O B 0 1 I v 1 Z h Q N q Z D 7 F o q a Y T a z + a n T s m Z V Q Y k j J U t a c h c / T 2 R 0 U j r S R T Y z o i a k V 7 2 Z u J / X j c 1 4 b W f c Z m k B i V b L A p T Q U x M Z n + T A V f I j J h Y Q p n i 9 l b C R l R R Z m w 6 J R u C t / z y K m l d 1 D y 3 5 t 1 f V u o 3 e R x F O I F T q I I H V 1 C H O 2 h A E x g M 4 R l e 4 c 0 R z o v z 7 n w s W g t O P n M M f + B 8 / g C U 0 4 1 S < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p a K e s X c d u i A m E E g 1 Q M 5 E m 1 x A r U U = " > A A A B 6 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 2 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S K Q w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m j j V j D d Y L G P d D q j h U i j e Q I G S t x P N a R R I 3 g p G N 1 O / 9 c S 1 E b F 6 w H H C / Y g O l A g F o 2 i l + 8 r j W a 9 U d q v u D G S Z e D k p Q 4 5 6 r / T V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n x S 7 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y a p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D K / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 i j Y E b / H l Z d I 8 r 3 p u 1 b u 7 K N e u 8 z g K c A w n U A E P L q E G t 1 C H B j A Y w D O 8 w p s j n R f n 3 f m Y t 6 4 4 + c w R / I H z + Q O W W I 1 T < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p a K e s X c d u i A m E E g 1 Q M 5 E m 1 x A r U U = " > A A A B 6 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 2 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S K Q w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m j j V j D d Y L G P d D q j h U i j e Q I G S t x P N a R R I 3 g p G N 1 O / 9 c S 1 E b F 6 w H H C / Y g O l A g F o 2 i l + 8 r j W a 9 U d q v u D G S Z e D k p Q 4 5 6 r / T V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n x S 7 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y a p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D K / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 i j Y E b / H l Z d I 8 r 3 p u 1 b u 7 K N e u 8 z g K c A w n U A E P L q E G t 1 C H B j A Y w D O 8 w p s j n R f n 3 f m Y t 6 4 4 + c w R / I H z + Q O W W I 1 T < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p a K e s X c d u i A m E E g 1 Q M 5 E m 1 x A r U U = " > A A A B 6 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 2 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S K Q w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m j j V j D d Y L G P d D q j h U i j e Q I G S t x P N a R R I 3 g p G N 1 O / 9 c S 1 E b F 6 w H H C / Y g O l A g F o 2 i l + 8 r j W a 9 U d q v u D G S Z e D k p Q 4 5 6 r / T V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n x S 7 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y a p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D K / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 i j Y E b / H l Z d I 8 r 3 p u 1 b u 7 K N e u 8 z g K c A w n U A E P L q E G t 1 C H B j A Y w D O 8 w p s j n R f n 3 f m Y t 6 4 4 + c w R / I H z + Q O W W I 1 T < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p a K e s X c d u i A m E E g 1 Q M 5 E m 1 x A r U U = " > A A A B 6 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 2 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S K Q w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m j j V j D d Y L G P d D q j h U i j e Q I G S t x P N a R R I 3 g p G N 1 O / 9 c S 1 E b F 6 w H H C / Y g O l A g F o 2 i l + 8 r j W a 9 U d q v u D G S Z e D k p Q 4 5 6 r / T V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n x S 7 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y a p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D K / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 i j Y E b / H l Z d I 8 r 3 p u 1 b u 7 K N e u 8 z g K c A w n U A E P L q E G t 1 C H B j A Y w D O 8 w p s j n R f n 3 f m Y t 6 4 4 + c w R / I H z + Q O W W I 1 T < / l a t e x i t > P ( `i = `? ) < l a t e x i t s h a 1 _ b a s e 6 4 = " M + 7 J E N g O 0 + A g D J G q y N w 6 4 4 u t s E w = " > A A A C B 3 i c d V D L S s N A F J 3 4 r P U V d S n I Y B H q J i S 1 t L o Q i m 5 c V r A P a E K Y T C f t 0 M m D m Y l Q Q n d u / B U 3 L h R x 6 y + 4 8 2 + c p B F U 9 M A w h 3 P u 5 d 5 7 v J h R I U 3 z Q 1 t Y X F p e W S 2 t l d c 3 N r e 2 9 Z 3 d r o g S j k k H R y z i f Q 8 J w m h I O p J K R v o x J y j w G O l 5 k 8 v M 7 9 0 S L m g U 3 s h p T J w A j U L q U 4 y k k l z 9 w A 6 Q H H s e b F d t w p h L z / M v t Y V E f H b s 6 h X T q N d q j R M T z s l Z s y B m A 1 q G m a M C C r R d / d 0 e R j g J S C g x Q 0 I M L D O W T o q 4 p J i R W d l O B I k R n q A R G S g a o o A I J 8 3 v m M E j p Q y h H 3 H 1 Q g l z 9 X t H i g I h p o G n K r O t x W 8 v E / / y B o n 0 T 5 2 U h n E i S Y j n g / y E Q R n B L B Q 4 p J x g y a a K I M y p 2 h X i M e I I S x V d W Y X w d S n 8 n 3 R r h m U a 1 n W 9 0 r o o 4 i i B f X A I q s A C T d A C V 6 A N O g C D O / A A n s C z d q 8 9 a i / a 6 7 x 0 Q S t 6 9 s A P a G + f z x 6 Z S Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " M + 7 J E N g O 0 + A g D J G q y N w 6 4 4 u t s E w = " > A A A C B 3 i c d V D L S s N A F J 3 4 r P U V d S n I Y B H q J i S 1 t L o Q i m 5 c V r A P a E K Y T C f t 0 M m D m Y l Q Q n d u / B U 3 L h R x 6 y + 4 8 2 + c p B F U 9 M A w h 3 P u 5 d 5 7 v J h R I U 3 z Q 1 t Y X F p e W S 2 t l d c 3 N r e 2 9 Z 3 d r o g S j k k H R y z i f Q 8 J w m h I O p J K R v o x J y j w G O l 5 k 8 v M 7 9 0 S L m g U 3 s h p T J w A j U L q U 4 y k k l z 9 w A 6 Q H H s e b F d t w p h L z / M v t Y V E f H b s 6 h X T q N d q j R M T z s l Z s y B m A 1 q G m a M C C r R d / d 0 e R j g J S C g x Q 0 I M L D O W T o q 4 p J i R W d l O B I k R n q A R G S g a o o A I J 8 3 v m M E j p Q y h H 3 H 1 Q g l z 9 X t H i g I h p o G n K r O t x W 8 v E / / y B o n 0 T 5 2 U h n E i S Y j n g / y E Q R n B L B Q 4 p J x g y a a K I M y p 2 h X i M e I I S x V d W Y X w d S n 8 n 3 R r h m U a 1 n W 9 0 r o o 4 i i B f X A I q s A C T d A C V 6 A N O g C D O / A A n s C z d q 8 9 a i / a 6 7 x 0 Q S t 6 9 s A P a G + f z x 6 Z S Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " M + 7 J E N g O 0 + A g D J G q y N w 6 4 4 u t s E w = " > A A A C B 3 i c d V D L S s N A F J 3 4 r P U V d S n I Y B H q J i S 1 t L o Q i m 5 c V r A P a E K Y T C f t 0 M m D m Y l Q Q n d u / B U 3 L h R x 6 y + 4 8 2 + c p B F U 9 M A w h 3 P u 5 d 5 7 v J h R I U 3 z Q 1 t Y X F p e W S 2 t l d c 3 N r e 2 9 Z 3 d r o g S j k k H R y z i f Q 8 J w m h I O p J K R v o x J y j w G O l 5 k 8 v M 7 9 0 S L m g U 3 s h p T J w A j U L q U 4 y k k l z 9 w A 6 Q H H s e b F d t w p h L z / M v t Y V E f H b s 6 h X T q N d q j R M T z s l Z s y B m A 1 q G m a M C C r R d / d 0 e R j g J S C g x Q 0 I M L D O W T o q 4 p J i R W d l O B I k R n q A R G S g a o o A I J 8 3 v m M E j p Q y h H 3 H 1 Q g l z 9 X t H i g I h p o G n K r O t x W 8 v E / / y B o n 0 T 5 2 U h n E i S Y j n g / y E Q R n B L B Q 4 p J x g y a a K I M y p 2 h X i M e I I S x V d W Y X w d S n 8 n 3 R r h m U a 1 n W 9 0 r o o 4 i i B f X A I q s A C T d A C V 6 A N O g C D O / A A n s C z d q 8 9 a i / a 6 7 x 0 Q S t 6 9 s A P a G + f z x 6 Z S Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " M + 7 J E N g O 0 + A g D J G q y N w 6 4 4 u t s E w = " > A A A C B 3 i c d V D L S s N A F J 3 4 r P U V d S n I Y B H q J i S 1 t L o Q i m 5 c V r A P a E K Y T C f t 0 M m D m Y l Q Q n d u / B U 3 L h R x 6 y + 4 8 2 + c p B F U 9 M A w h 3 P u 5 d 5 7 v J h R I U 3 z Q 1 t Y X F p e W S 2 t l d c 3 N r e 2 9 Z 3 d r o g S j k k H R y z i f Q 8 J w m h I O p J K R v o x J y j w G O l 5 k 8 v M 7 9 0 S L m g U 3 s h p T J w A j U L q U 4 y k k l z 9 w A 6 Q H H s e b F d t w p h L z / M v t Y V E f H b s 6 h X T q N d q j R M T z s l Z s y B m A 1 q G m a M C C r R d / d 0 e R j g J S C g x Q 0 I M L D O W T o q 4 p J i R W d l O B I k R n q A R G S g a o o A I J 8 3 v m M E j p Q y h H 3 H 1 Q g l z 9 X t H i g I h p o G n K r O t x W 8 v E / / y B o n 0 T 5 2 U h n E i S Y j n g / y E Q R n B L B Q 4 p J x g y a a K I M y p 2 h X i M e I I S x V d W Y X w d S n 8 n 3 R r h m U a 1 n W 9 0 r o o 4 i i B f X A I q s A C T d A C V 6 A N O g C D O / A A n s C z d q 8 9 a i / a 6 7 x 0 Q S t 6 9 s A P a G + f z x 6 Z S Q = = < / l a t e x i t > P ( `j = `?) < l a t e x i t s h a 1 _ b a s e 6 4 = " J e v 8 x / Q e j F q i q 7 D H L Z E S I o j O y i 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " J e v 8 x / Q e j F q i q 7 D H < l a t e x i t s h a 1 _ b a s e 6 4 = " J e v 8 x / Q e j F q i q 7 D H < l a t e x i t s h a 1 _ b a s e 6 4 = " J e v 8 x / Q e j F q i q 7 D H ) the agreement between two annotators, P(ℓ i = ℓ j ). ℓ i and ℓ j are labels given by annotator i and j, ℓ ⋆ is the oracle label. In our setting, part a) is unobserved (gray) part b) is observed (black). = " > A A A C B 3 i c d V D L S s N A F J 3 U V 6 2 v q E t B B o t Q N y G p p d W F U O i K M O S Z t H L K Q 9 z w k C K M B a U s q G e l F n C D f Y 6 T r T S 5 S v 3 t L u K B h c C 2 n E X F 8 N A r o k G I k l e T q + 7 a P 5 N j z Y K t i E 8 b c m 7 P s S 2 w h E Z 8 d u X r Z N G r V a v 3 Y h H N y 2 s i J W Y e W Y W Y o g x w t V 3 + 3 B y G O f R J I z J A Q f c u M p J M g L i l m Z F a y Y 0 E i h C d o R P q K B s g n w k m y O 2 b w U C k D O A y 5 e o G E m f q 9 I 0 G + E F P f U 5 X p 1 u K 3 l 4 p / e f 1 Y D k + c h A Z R L E m A 5 4 O G M Y M y h G k o c E A 5 w Z J N F U G Y U 7 U r x G P E E Z Y q u p I K 4 e t S + D / p V A 3 L N K y r W r l 5 n s d R B H v g A F S A B R q g C S 5 B C 7 Q B B n f g A T y B Z + 1 e e 9 R L Z E S I o j O y i 4 = " > A A A C B 3 i c d V D L S s N A F J 3 U V 6 2 v q E t B B o t Q N y G p p d W F U H T j s o J 9 Q B P C Z D p t x 0 4 e z E y E E r p z 4 6 + 4 c a G I W 3 / B n X / j J I 2 g o g e G O Z x z L / f e 4 0 W M C m m a H 1 p h Y X F p e a W 4 W l p b 3 9 j c 0 r d 3 O i K M O S Z t H L K Q 9 z w k C K M B a U s q G e l F n C D f Y 6 T r T S 5 S v 3 t L u K B h c C 2 n E X F 8 N A r o k G I k l e T q + 7 a P 5 N j z Y K t i E 8 b c m 7 P s S 2 w h E Z 8 d u X r Z N G r V a v 3 Y h H N y 2 s i J W Y e W Y W Y o g x w t V 3 + 3 B y G O f R J I z J A Q f c u M p J M g L i l m Z F a y Y 0 E i h C d o R P q K B s g n w k m y O 2 b w U C k D O A y 5 e o G E m f q 9 I 0 G + E F P f U 5 X p 1 u K 3 l 4 p / e f 1 Y D k + c h A Z R L E m A 5 4 O G M Y M y h G k o c E A 5 w Z J N F U G Y U 7 U r x G P E E Z Y q u p I K 4 e t S + D / p V A 3 L N K y r W r l 5 n s d R B H v g A F S A B R q g C S 5 B C 7 Q B B n f g A T y B Z + 1 e e 9 R L Z E S I o j O y i 4 = " > A A A C B 3 i c d V D L S s N A F J 3 U V 6 2 v q E t B B o t Q N y G p p d W F U H T j s o J 9 Q B P C Z D p t x 0 4 e z E y E E r p z 4 6 + 4 c a G I W 3 / B n X / j J I 2 g o g e G O Z x z L / f e 4 0 W M C m m a H 1 p h Y X F p e a W 4 W l p b 3 9 j c 0 r d 3 O i K M O S Z t H L K Q 9 z w k C K M B a U s q G e l F n C D f Y 6 T r T S 5 S v 3 t L u K B h c C 2 n E X F 8 N A r o k G I k l e T q + 7 a P 5 N j z Y K t i E 8 b c m 7 P s S 2 w h E Z 8 d u X r Z N G r V a v 3 Y h H N y 2 s i J W Y e W Y W Y o g x w t V 3 + 3 B y G O f R J I z J A Q f c u M p J M g L i l m Z F a y Y 0 E i h C d o R P q K B s g n w k m y O 2 b w U C k D O A y 5 e o G E m f q 9 I 0 G + E F P f U 5 X p 1 u K 3 l 4 p / e f 1 Y D k + c h A Z R L E m A 5 4 O G M Y M y h G k o c E A 5 w Z J N F U G Y U 7 U r x G P E E Z Y q u p I K 4 e t S + D / p V A 3 L N K y r W r l 5 n s d R B H v g A F S A B R q g C S 5 B C 7 Q B B n f g A T y B Z + 1 e e 9 R L Z E S I o j O y i 4 = " > A A A C B 3 i c d V D L S s N A F J 3 U V 6 2 v q E t B B o t Q N y G p p d W F U H T j s o J 9 Q B P C Z D p t x 0 4 e z E y E E r p z 4 6 + 4 c a G I W 3 / B n X / j J I 2 g o g e G O Z x z L / f e 4 0 W M C m m a H 1 p h Y X F p e a W 4 W l p b 3 9 j c 0 r d 3 O i K M O S Z t H L K Q 9 z w k C K M B a U s q G e l F n C D f Y 6 T r T S 5 S v 3 t L u K B h c C 2 n E X F 8 N A r o k G I k l e T q + 7 a P 5 N j z Y K t i E 8 b c m 7 P s S 2 w h E Z 8 d u X r Z N G r V a v 3 Y h H N y 2 s i J W Y e W Y W Y o g x w t V 3 + 3 B y G O f R J I z J A Q f c u M p J M g L i l m Z F a y Y 0 E i h C d o R P q K B s In this paper, we work on the setting that oracle labels are unobserved (see Figure 1 ). Within this setting is provided a theory for estimating the oracle accuracy on classification tasks which formalises what empirical works have hinted towards (Richie et al., 2022) , that machine learning classification models may outperform the humans who provide them with training supervision. Our aim is not to optimally combine machine learning systems, but rather to estimate the oracle accuracy of a single machine learning system by comparing it with the results obtained from multiple human annotators. Our theory includes i) upper bounds for the averaged oracle accuracy of the annotators, ii) lower bounds for the oracle accuracy of the model, and iii) finite sample analysis for both bounds and their margin which represents the model's outperformance. Based on our theory, we propose an algorithm to detect competitive models and to report confidence scores, which formally bound the probability that a given model outperforms the average human annotator. Empirically, we observe that some existing models for sentiment classification and natural language inference (NLI) have already achieved superhuman performance with high probability.

2. EVALUATION THEORY

We now present our theory for human annotators and machine learning models with oracle labels.

2.1. PROBLEM STATEMENT

We are given K labels crowd sourced from K human annotators, {ℓ i } K i=1 , and some labels from a model ℓ M . The probability of two annotators a i and a j possess matched annotations with the other is P(ℓ i = ℓ j ). Denote by ℓ K the label of the "average" human annotator which we define as the label obtained by selecting one of the K human annotators uniformly at random. We seek to formally compare the oracle accuracy of the average human, P(ℓ K = ℓ ⋆ ), with that of the machine learning model, P(ℓ M = ℓ ⋆ ), where ℓ ⋆ is the unobserved oracle label. Denote by ℓ G the label obtained by aggregating (say, by voting) the K human annotators' labels. We distinguish between the oracle accuracy P(ℓ M = ℓ ⋆ ) and the agreement with human annotations P(ℓ M = ℓ G ), although these two concepts have been confounded in many previous applications and benchmarks.



H T j s o J 9 Q B P C Z D p t x 0 4 e z E y E E r p z 4 6 + 4 c a G I W 3 / B n X / j J I 2 g o g e G O Z x z L / f e 4 0 W M C m m a H 1 p h Y X F p e a W 4 W l p b 3 9 j c 0 r d 3

e t N d 5 a U H L e 3 b B D 2 h v n 9 C w m U o = < / l a t e x i t >

e t N d 5 a U H L e 3 b B D 2 h v n 9 C w m U o = < / l a t e x i t >

e t N d 5 a U H L e 3 b B D 2 h v n 9 C w m U o = < / l a t e x i t >

Figure1: The relationship between a) the oracle accuracy of the annotators, P(ℓ i = ℓ ⋆ ), and b) the agreement between two annotators, P(ℓ i = ℓ j ). ℓ i and ℓ j are labels given by annotator i and j, ℓ ⋆ is the oracle label. In our setting, part a) is unobserved (gray) part b) is observed (black).

funding

* Work was done while the authors were with the Australian National University and Data61 CSIRO.

