HAVE MISSING DATA? MAKE IT MISS MORE! IMPUT-ING TABULAR DATA WITH MASKED AUTOENCODING

Abstract

We present REMASKER, a novel method for imputing missing values in tabular data by extending the masked autoencoding framework. In contrast to prior work, REMASKER is both simple -besides the missing values (i.e., naturally masked), we randomly "re-mask" another set of values, optimize the autoencoder by reconstructing this re-masked set, and apply the trained model to predict the missing values; and effective -with extensive evaluation on benchmark datasets, we show that REMASKER performs on par with or outperforms state-of-the-art methods in terms of both imputation fidelity and utility under various missingness settings, while its performance advantage often increases with the ratio of missing data. We further explore theoretical justification for its effectiveness, showing that REMASKER tends to learn missingness-invariant representations of tabular data. Our findings indicate that masked modeling represents a promising direction for further research on tabular data imputation. The code is available at:

1. INTRODUCTION

Missing values are ubiquitous in real-world tabular data due to various reasons during data collection, processing, storage, or transmission. It is often desirable to know the most likely values of missing data before performing downstream tasks (e.g., classification or synthesis). To this end, intensive research has been dedicated to developing imputation methods ("imputers") that estimate missing values based on observed data (Yoon et al., 2019; Jarrett et al., 2022; Kyono et al., 2021; Stekhoven & Buhlmann, 2012; Mattei & Frellsen, 2018 ). Yet, imputing missing values in tabular data with high fidelity and utility remains an open problem, due to challenges including the intricate correlation across different features, the variety of missingness scenarios, and the scarce amount of available data with respect to the number of missing values. The state-of-the-art imputers can be categorized as either discriminative or generative. The discriminative imputers, such as MissForest (Stekhoven & Buhlmann, 2012) , MICE (van Buuren & Groothuis-Oudshoorn, 2011) , and MIRACLE (Kyono et al., 2021) , impute missing values by modeling their conditional distributions on the basis of other values. In practice, these methods are often hindered by the requirement of specifying the proper functional forms of conditional distributions and adding the set of appropriate regularizers. The generative imputers, such as GAIN (Yoon et al., 2019) , MIWAE (Mattei & Frellsen, 2018) , GAMIN (Yoon & Sull, 2020) , and HI-VAE (Nazabal et al., 2020) , estimate the joint distributions of all the features by leveraging the capacity of deep generative models and impute missing values by querying the trained models. Empirically, GAN-based methods often require a large amount of training data and suffer the difficulties of adversarial training (Goodfellow et al., 2014) , while VAE-based methods often face the limitations of training through variational bounds (Zhao et al., 2022) . Further, some of these methods either require complete data during training or operate on the assumptions of specific missingness patterns. In this paper, we present REMASKER, a novel method that extends the masked autoencoding (MAE) framework (Devlin et al., 2018; He et al., 2022) to imputing missing values of tabular data. The idea of REMASKER is simple: Besides the missing values in the given dataset (i.e., naturally masked), we randomly select and "re-mask" another set of values, optimize the autoencoder with the objective of reconstructing this re-masked set, and then apply the trained autoencoder to predict the missing values. Compared with the prior work, REMASKER enjoys the following desiderata: (i) it is instantiated with Transformer (Vaswani et al., 2017) as its backbone, of which the self-attention mechanism is able to capture the intricate inter-feature correlation (Huang et al., 2020) ; (ii) without specific assumptions about the underlying missingness mechanisms, it is applicable to various scenarios even if complete data is unavailable; and (iii) as the re-masking approach naturally accounts for missing values and encourages learning high-level representations beyond low-level statistics, REMASKER works effectively even under a high ratio of missing data (e.g., 0.7). With extensive evaluation on 12 benchmark datasets under various missingness scenarios, we show that REMASKER performs on par with or outperforms 13 popular methods in terms of both imputation fidelity and utility, while its performance advantage often increases with the ratio of missing data. We further explore the theoretical explanation for its effectiveness. We find that REMASKER encourages learning missingness-invariant representations of tabular data, which are insensitive to missing values. Our findings indicate that, besides its success in the language and vision domains, masked modeling also represents a promising direction for future research on tabular data imputation.

2. RELATED WORK

Here, we survey relevant literature in three categories. Tabular data imputation. The existing imputation methods can be roughly categorized as either discriminative or generative. The discriminative methods (Stekhoven & Buhlmann, 2012; van Buuren & Groothuis-Oudshoorn, 2011; Kyono et al., 2021) often specify a univariable model for each feature conditional on all others and perform cyclic regression over each target variable until convergence. Recent work has also explored adaptively selecting and configuring multiple discriminative imputers (Jarrett et al., 2022) . The generative methods either implicitly train imputers as generators within the GAN framework (Yoon et al., 2019; Yoon & Sull, 2020) or explicitly train deep latent-variable models to approximate the joint distributions of all features (Mattei & Frellsen, 2018; Nazabal et al., 2020) . There are also imputers based on representative-value (e.g., mean, median, or frequent values) substitution (Hawthorne & Elliott, 2005) , EM optimization (García-Laencina et al., 2010) , matrix completion (Hastie et al., 2015) , or optimal transport (Muzellec et al., 2020) . Transformer. Transformer has emerged as a dominating design (Vaswani et al., 2017) in the language domain, in which multi-head self-attention and MLP layers are stacked to capture both short-and long-term correlations between words. Recent work has explored the use of Transformer in the vision domain by treating each image as a grid of visual words (Dosovitskiy et al., 2020) . For instance, it has been integrated into image generation models (Jiang et al., 2021; Zhang et al., 2021; Hudson & Zitnick, 2021) , achieving performance comparable to CNN-based models. Masked autoencoding. Autoencoding is a classical method for learning representation in a selfsupervised manner (Vincent et al., 2008; Pathak et al., 2016) : an encoder maps an input to its representation and a decoder reconstructs the original input. Meanwhile, masked modeling is originally proposed as a pre-training method in the language domain: by holding out a proportion of a word sequence, it trains the model to predict the masked words (Devlin et al., 2018; Radford & Narasimhan, 2018) . Recent work has combined autoencoding and masked modeling in vision tasks (Dosovitskiy et al., 2020; Bao et al., 2022) . Particularly, the seminal MAE (He et al., 2022) represents the state of the art in self-supervised pre-training on the ImageNet-1K benchmark. The work is also related to that models missing data by adapting existing model architectures (Przewięźlikowski et al., 2021) . To our best knowledge, this represents the first work to explore the masked autoencoding method with Transformer in the task of tabular data imputation.

3. REMASKER

Next, we present REMASKER, an extremely simple yet effective method for imputing missing values of tabular data. We begin by formalizing the imputation problem.

3.1. PROBLEM FORMALIZATION

Incomplete data. To model tabular data with d features, we consider a d-dimensional random variable x ≜ (x 1 , . . . , x d ) ∈ X 1 × . . . × X d , where X i is either continuous or categorical for i ∈ {1, . . . , d}.  E = " > A A A C k H i c b Z F N j 9 M w E I b d 8 L W U r + 5 y R E I R L R K n K l n x s T c W u C B O i 0 R 3 V 9 R R Z T u T 1 q r t R P Z k a W T l x q / h C n + G f 4 P T V o J 2 G c n S q 5 n H o 5 l 3 e K W k w y T 5 3 Y t u 3 L x 1 + 8 7 B 3 f 6 9 + w 8 e P h o c H p 2 7 s r Y C J q J U p b 3 k z I G S B i Y o U c F l Z Y F p r u C C L z 9 0 9 Y s r s E 6 W 5 g s 2 F W S a z Y 0 s p G A Y U r P B 0 x H l 2 l M F B U 6 p Z r j g h d c t t X K + w K w d z Q b D Z J y s I 7 4 u 0 q 0 Y k m 2 c z Q 5 7 X 2 l e i l q D Q a G Y c 9 M 0 q T D z z K I U C t o + r R 1 U T C z Z H K Z B G q b B Z X 6 9 S B s / D 5 k 8 L k o b n s F 4 n f 3 3 h 2 f a u U b z Q H a z u p 3 a a t N k n + / A / / J c 7 7 I O N b O N z f d m x O I k 8 9 J U N Y I R m x G L W s V Y x p 2 f c S 4 t C F R N E E x Y G b a M x Y J Z J j C 4 3 q c G v o l S a 2 Z y T 4 W Q V r S e L s G a Z P w K V v R K B J v A e r r g 5 c q P q A s d K n T Y K K A d P G r b v 3 T b D 9 d I 9 7 2 / L s 6 P x + n r 8 c v P x 8 P T 9 9 u 7 H J A n 5 B l 5 Q V L y h p y S j + S M T I g g 3 8 k P 8 p P 8 i o 6 i k + h t 9 G 6 D R r 3 t n 8 d k J 6 J P f w A U k s z A < / l a t e x i t >

[m]

< l a t e x i t s h a 1 _ b a s e 6 4 = " N + z M C e A s w t a w C n z b f 9  I 4 b V d b b g E = " > A A A C k H i c b Z F N j 9 M w E I b d 8 L W U r + 5 y R E I R L R K n K l n x s T c W u C B O i 0 R 3 V 9 R R Z T u T 1 q r t R P Z k a W T l x q / h C n + G f 4 P T V o J 2 G c n S q 5 n H o 5 l 3 e K W k w y T 5 3 Y t u 3 L x 1 + 8 7 B 3 f 6 9 + w 8 e P h o c H p 2 7 s r Y C J q J U p b 3 k z I G S B i Y o U c F l Z Y F p r u C C L z 9 0 9 Y s r s E 6 W 5 g s 2 F W S a z Y 0 s p G A Y U r P B 0 x H l 2 l M F B U 6 p Z r j g h d c t t X K + w K w d z Q b D Z J y s I 7 4 u 0 q 0 Y k m 2 c z Q 5 7 X 2 l e i l q D Q a G Y c 9 M 0 q T D z z K I U C t o + r R 1 U T C z Z H K Z B G q b B Z X 6 9 S B s / D 5 k 8 L k o b n s F 4 n f 3 3 h 2 f a u U b z Q H a z u p 3 a a t N k n + / A / / J c 7 7 I O N b O N z f d m x O I k 8 9 J U N Y I R m x G L W s V Y x p 2 f c S 4 t C F R N E E x Y G b a M x Y J Z J j C 4 3 q c G v o l S a 2 Z y T 4 W Q V r S e L s G a Z P w K V v R K B J v A e r E = " > A A A C k H i c b Z F N j 9 M w E I b d 8 L W U r + 5 y R E I R L R K n K l n x s T c W u C B O i 0 R 3 V 9 R R Z T u T 1 q r t R P Z k a W T l x q / h C n + G f 4 P T V o J 2 G c n S q 5 n H o 5 l 3 e K W k w y T 5 3 Y t u 3 L x 1 + 8 7 B 3 f 6 9 + w 8 e P h o c H p 2 7 s r Y C J q J U p b 3 k z I G S B i Y o U c F l Z Y F p r u C C L z 9 0 9 Y s r s E 6 W 5 g s 2 F W S a z Y 0 s p G A Y U r P B 0 x H l 2 l M F B U 6 p Z r j g h d c t t X K + w K w d z Q b D Z J y s I 7 4 u 0 q 0 Y k m 2 c z Q 5 7 X 2 l e i l q D Q a G Y c 9 M 0 q T D z z K I U C t o + r R 1 U T C z Z H K Z B G q b B Z X 6 9 S B s / D 5 k 8 L k o b n s F 4 n f 3 3 h 2 f a u U b z Q H a z u p 3 a a t N k n + / A / / J c 7 7 I O N b O N z f d m x O I k 8 9 J U N Y I R m x G L W s V Y x p 2 f c S 4 t C F R N E E x Y G b a M x Y J Z J j C 4 3 q c G v o l S a 2 Z y T 4 W Q V r S e L s G a Z P w K V v R K B J v A e r E = " > A A A C k H i c b Z F N j 9 M w E I b d 8 L W U r + 5 y R E I R L R K n K l n x s T c W u C B O i 0 R 3 V 9 R R Z T u T 1 q r t R P Z k a W T l x q / h C n + G f 4 P T V o J 2 G c n S q 5 n H o 5 l 3 e K W k w y T 5 3 Y t u 3 L x 1 + 8 7 B 3 f 6 9 + w 8 e P h o c H p 2 7 s r Y C J q J U p b 3 k z I G S B i Y o U c F l Z Y F p r u C C L z 9 0 9 Y s r s E 6 W 5 g s 2 F W S a z Y 0 s p G A Y U r P B 0 x H l 2 l M F B U 6 p Z r j g h d c t t X K + w K w d z Q b D Z J y s I 7 4 u 0 q 0 Y k m 2 c z Q 5 7 X 2 l e i l q D Q a G Y c 9 M 0 q T D z z K I U C t o + r R 1 U T C z Z H K Z B G q b B Z X 6 9 S B s / D 5 k 8 L k o b n s F 4 n f 3 3 h 2 f a u U b z Q H a z u p 3 a a t N k n + / A / / J c 7 7 I O N b O N z f d m x O I k 8 9 J U N Y I R m x G L W s V Y x p 2 f c S 4 t C F R N E E x Y G b a M x Y J Z J j C 4 3 q c G v o l S a 2 Z y T 4 W Q V r S e L s G a Z P w K V v R K B J v A e r E = " > A A A C k H i c b Z F N j 9 M w E I b d 8 L W U r + 5 y R E I R L R K n K l n x s T c W u C B O i 0 R 3 V 9 R R Z T u T 1 q r t R P Z k a W T l x q / h C n + G f 4 P T V o J 2 G c n S q 5 n H o 5 l 3 e K W k w y T 5 3 Y t u 3 L x 1 + 8 7 B 3 f 6 9 + w 8 e P h o c H p 2 7 s r Y C J q J U p b 3 k z I G S B i Y o U c F l Z Y F p r u C C L z 9 0 9 Y s r s E 6 W 5 g s 2 F W S a z Y 0 s p G A Y U r P B 0 x H l 2 l M F B U 6 p Z r j g h d c t t X K + w K w d z Q b D Z J y s I 7 4 u 0 q 0 Y k m 2 c z Q 5 7 X 2 l e i l q D Q a G Y c 9 M 0 q T D z z K I U C t o + r R 1 U T C z Z H K Z B G q b B Z X 6 9 S B s / D 5 k 8 L k o b n s F 4 n f 3 3 h 2 f a u U b z Q H a z u p 3 a a t N k n + / A / / J c 7 7 I O N b O N z f d m x O I k 8 9 J U N Y I R m x G L W s V Y x p 2 f c S 4 t C F R N E E x Y G b a M x Y J Z J j C 4 3 q c G v o l S a 2 Z y T 4 W Q V r S e L s G a Z P w K V v R K B J v A e r r g 5 c q P q A s d K n T Y K K A d P G r b v 3 T b D 9 d I 9 7 2 / L s 6 P x + n r 8 c v P x 8 P T 9 9 u 7 H J A n 5 B l 5 Q V L y h p y S j + S M T I g g 3 8 k P 8 p P 8 i o 6 i k + h t 9 G 6 D R r 3 t n 8 d k J 6 J P f w A U k s z A < / l a t e x i t > [m] Mask Token < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 u d R i F Z 5 X f Y y f 7 4 O v y P e Z e R / t v 0 = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S H m 3 h i J A s Y h C n Y E d A O V Z w 4 V g k 0 l Z 0 o 2 g 9 n j S r 7 M P a H b e x L H + G X t u P x i f h y u Y h Q V J G W u m v m d + M Z v + T l 0 p 6 S t N f n W j n 3 v 0 H D 3 c f d R 8 / e f p s b / / g + Y m 3 l Q M c g l X W n e X C o 5 I G h y R J 4 V n p U O h c 4 W k + + 7 q o n 1 6 i 8 9 K a H 1 S X O N L i w s i J B E E h N U z m 4 0 E y 3 u + l / X Q Z 8 V 2 R r U W P r e N 4 f N D 5 y Q s L l U Z D o I T 3 5 1 l a 0 q g R j i Q o b L u 8 8 l g K m I k L P A / S C I 1 + 1 C y 3 b e M 3 I V P E E + v C M x Q v s / 9 2 N E J 7 X + s 8 k F r Q 1 G / U 5 q s h 2 / w C / C + f 6 0 3 W k x a u d s X W j j T 5 P G q k K S t C A 6 s V J 5 W K y c Y L 0 + J C O g R S d R A C n A y / j G E q n A A K 1 n a 5 w S u w W g t T N B x A O m g b P k N n 0 v 5 H n P N L C D a h a / g 0 t / M m 4 T 5 M K M l T r Z A v 4 K R t / 9 J t N 1 w j 2 / b + r j g Z 9 L N P / Q / f B 7 2 j L + u 7 7 L K X 7 D V 7 x z J 2 y I 7 Y N 3 b M h g y Y Z N f s h t 1 2 f k e v o i R 6 u 0 K j z r r n B d u I 6 P 0 f y b D C D g = = < / l a t e x i t > x2 < l a t e x i t s h a 1 _ b a s e 6 4 = " + r M T j / 6 k 5 l 2 a Q f q g L C q A z m d e h G E = " > A A A C d X i c b Z F L b 9 N A E M c 3 p k A J r 5 Y e E Z J F D O I U 7 P I 8 V v T C s U h N W 9 G N o v V 4 0 q y y D 2 t 3 X G J Z / g x c y 0 f j k 3 D t 5 i H R p B 1 p p b 9 m f j O a / U 9 e K u k p T f 9 2 o n t b 9 x 8 8 3 H 7 U f f z k 6 b P n O 7 s v T r y t H O A A r L L u L B c e l T Q 4 I E k K z 0 q H Q u c K T / P p 4 b x + e o n O S 2 u O q S 5 x q M W F k W M J g k J q k M x G H 5 L R T i / t p 4 u I b 4 t s J X p s F U e j 3 c 5 P X l i o N B o C J b w / z 9 K S h o 1 w J E F h 2 + W V x 1 L A V F z g e Z B G a P T D Z r F t G 7 8 J m S I e W x e e o X i R v d n R C O 1 9 r f N A a k E T v 1 a b L Y d s 8 n P w T j 7 X 6 6 w n L V z t i o 0 d a f x 1 2 E h T V o Q G l i u O K x W T j e e m x Y V 0 C K T q I A Q 4 G X 4 Z w 0 Q 4 A R S s 7 X K D v 8 B q L U z R c A D p o G 3 4 F J 1 J + 5 9 w x i 8 h 2 I S u 4 Z P c z p q E + z C h J E + 1 Q j 6 H k 7 b 9 T 7 f d c I 1 s 0 / v b 4 m S / n 3 3 u f / y x 3 z v 4 t r r L N n v J X r N 3 L G N f 2 A H 7 z o 7 Y g A G T 7 D e 7 Y n 8 6 / 6 J X U R K 9 X a J R Z 9 W z x 9 Y i e n 8 N y 7 7 C D w = = < / l a t e x i t > x3 < l a t e x i t s h a 1 _ b a s e 6 4 = " E h U s I D 3 y t r 7 u N b R C J y Z m w G w l g u w = " > A A A C d X i c b Z F N b 9 N A E I Y 3 p k A J X 2 0 5 V k g W M Y h T s C s o P V Z w 6 b F I p K 3 o R t F 6 P G l W 2 Q 9 r d 1 x i W f 4 N X O G n 9 Z f 0 y u Z D g q Q d a a V X M 8 + M Z t / J S y U 9 p e l N J 3 q w 9 f D R 4 + 0 n 3 a f P n r 9 4 u b O 7 d + Z t 5 Q A H Y J V 1 F 7 n w q K T B A U l S e F E 6 F D p X e J 5 P v 8 7 r 5 9 f o v L T m O 9 U l D r W 4 M n I s Q V B I D Z L Z 6 D A Z 7 f T S f r q I + K 7 I V q L H V n E 6 2 u 3 8 4 I W F S q M h U M L 7 y y w t a d g I R x I U t l 1 e e S w F T M U V X g Z p h E Y / b B b b t v H b k C n i s X X h G Y o X 2 f 8 7 G q G 9 r 3 U e S C 1 o 4 t d q s + W Q T X 4 O 3 s v n e p 3 1 p I W r X b G x I 4 2 P h o 0 0 Z U V o Y L n i u F I x 2 X h u W l x I h 0 C q D k K A k + G X M U y E E 0 D B 2 i 4 3 + B O s 1 s I U D Q e Q D t q G T 9 G Z t P 8 J Z / w a g k 3 o G j 7 J 7 a x J u A 8 T S v J U K + R z O G n b f 3 T b D d f I N r 2 / K 8 4 O + t l h / + O 3 g 9 7 x l 9 V d t t k + e 8 P e s 4 x 9 Z s f s h J 2 y A Q M m 2 S / 2 m / 3 p 3 E a v o y R 6 t 0 S j z q r n F V u L 6 M N f 0 e j C E g = = < / l a t e x i t > x6 < l a t e x i t s h a 1 _ b a s e 6 4 = " F U n 4 M / P h f w q r T 1 4 O H 2 o 5 3 H 3 S d P n + 0 + 3 9  V o + W l r c G M W r u 7 f E A = " > A A A C d X i c b Z F N b 9 N A E I Y 3 L p Q S o L R w R E h W Y x C n Y F d Q e q z g w r F I p K 3 a j a L 1 e N K s s h / W 7 r i N Z f k 3 9 A o / j V / C l c 2 H B E k Z a a V X M 8 + M Z t / J S y U 9 p e m v T t / c e Z t 5 Q A H Y J V 1 F 7 n w q K T B A U l S e F E 6 F D p X e J 5 P v 8 z r 5 z f o v L T m O 9 U l D r W 4 N n I s Q V B I D Z L Z 6 D g Z 7 f X S f r q I + L 7 I V q L H V n E 6 2 u 9 c 8 s J C p d E Q K O H 9 V Z a W N G y E I w k K 2 y 6 v P J Y C p u I a r 4 I 0 Q q M f N o t t 2 / h N y B T x 2 L r w D M W L 7 L 8 d j d D e 1 z o P p B Y 0 8 W u 1 2 X L I J j 8 H / 8 v n e p 3 1 p I W r X b G x I 4 2 P h 4 0 0 Z U V o Y L n i u F I x 2 X h u W l x I h 0 C q D k K A k + G X M U y E E 0 D B 2 i 4 3 e A t W a 2 G K h g N I B 2 3 D p + h M 2 v + I M 3 4 D w S Z 0 D Z / k d t Y k 3 I c J J X m q F f I 5 n L T t X + G 4 A X R m E d 1 4 k c A = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S X i 0 c E Z J F D O I U 7 A g K x 6 p c O B a J t B X d K F q P J 8 0 q + 7 B 2 x 2 2 M 5 c / Q a / l o f B K u b B 4 S J G W k l f 6 a + c 1 o 9 j 9 5 q a S n N P 3 V i X b u 3 L 1 3 f / d B 9 + G j x 0 + e 7 u 0 / O / G 2 c o B D s M q 6 s 1 x 4 V N L g k C Q p P C s d C p 0 r P M 1 n n x f 1 0 0 t 0 X l r z j e o S R 1 p c G D m R I C i k h s m P 8 S A Z 7 / X S f r q M + L b I 1 q L H 1 n E 8 3 u 9 8 5 4 W F S q M h U M L 7 8 y w t a d Q I R x I U t l 1 e e S w F z The observational access to x is mediated by an mask variable m ≜ (m 1 , . . . , m d ) ∈ {0, 1} d , which indicates the missing values of x, such that x i is accessible only if m i = 1. In other words, we observe x in its incomplete form x ≜ (x 1 , . . . , xd ) with M Q F n g d p h E Y / a p b b t v H r k C n i i X X h G Y q X 2 X 8 7 G q G 9 r 3 U e S C 1 o 6 j d q 8 9 W Q b X 4 B / p f P 9 S b r S Q t X u 2 J r R 5 p 8 G j X S l B W h g d W K k 0 r F Z O O F a X E h H Q K p O g g B T o Z f x j A V T g A F a 7 v c 4 B V Y r Y U p G g 4 g H b Q N n 6 E z a f 8 D z v k l B J v Q N X y a 2 3 m T c B 8 m l O S p V s g X c N K 2 f + m 2 G 6 6 R b X t / W 5 w M + t l B / / 3 X Q e / w a H 2 X X f a C v W J v W c Y + s k P 2 h R 2 z I Q M m 2 T W 7 Y T 8 7 v 6 O X U R K 9 W a F R Z 9 3 z n G 1 E 9 O 4 P z d D C E A = = < / l a t e x i t > z2 < l a t e x i t s h a 1 _ b a s e 6 4 = " m W C + J G p z R F 8 V 3 R Y 9 M r Y y f v 7 8 H y c = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S H m 3 h i J A s Y h C n Y J c W e q z g w r F I p K 3 o R t F 6 P G l W 2 Y e 1 O y 5 x L X + G X u G j 8 U m 4 s n l I k J S R V v p r 5 j e j 2 f / k p Z K e 0 v R X J 9 q 6 c / f e / e 0 H 3 Y e P H j / Z 2 d 1 7 e u p t 5 Q A H Y J V 1 5 7 n w q K T B A U l S e F 4 6 F D p X e J Z P P 8 3 r Z 1 f o v L T m K 9 U l D r W 4 N H I s Q V B I D Z L r 0 b t k t N t L + + k i 4 t s i W 4 k e W 8 X J a K / z j R c W K o 2 G Q A n v L 7 K 0 p G E j H E l Q 2 H Z 5 5 b E U M B W X e B G k E R r 9 s F l s 2 8 a v Q q a I x 9 a F Z y h e Z P / t a I T 2 v t Z 5 I L W g i V + r z Z Z D N v k 5 + F 8 + 1 + u s J y 1 c 7 Y q N H W l 8 N G y k K S t C A 8 s V x 5 W K y c Z z 0 + J C O g R S d R A C n A y / j G E i n A A K 1 n a 5 w e 9 g t R a m a D i A d N A 2 f I r O p P 1 D n P E r C D a h a / g k t 7 M m 4 T 5 M K M l T r Z D P 4 a R t / 9 J t N 1 w j 2 / T + t j j d 7 2 f v + w d f 9 n v H H 1 d 3 2 W b P 2 U v 2 h m X s A z t m n 9 k J G z B g k t 2 w H + x n 5 3 f 0 I k q i 1 0 s 0 6 q x 6 n r G 1 i N 7 + A c / e w h E = < / l a t e x i t > z3 < l a t e x i t s h a 1 _ b a s e 6 4 = " Y P E T O / y H F z d W d j 5 r G u L C z Q n r 6 4 I = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S H m 3 h i J A s Y h C n Y F e 0 c K z K h W O R S F v R j a L 1 e N K s s g 9 r d 9 z G W P 4 M v Z a P x i f h y u Y h Q V J G W u m v m d + M Z v + T l 0 p 6 S t N f n W j r 3 v 0 H D 7 c f d R 8 / e f p s Z 3 f v + a m 3 l Q M c g F X W n e f C o 5 I G B y R J 4 X n p U O h c 4 V k + / T y v n 1 2 h 8 9 K a b 1 S X O N T i 0 s i x B E E h N U h + j A 6 T 0 W 4 v 7 a e L i O + K b C V 6 b B U n o 7 3 O d 1 5 Y q D Q a A i W 8 v 8 j S k o a N c C R B Y d v l l c d S w F R c 4 k W Q R m j 0 w 2 a x b R u / C Z k i H l s X n q F 4 k f 2 3 o x H a + 1 r n g d S C J n 6 t N l s O 2 e T n 4 H / 5 X K + z n r R w t S s 2 d q T x p 2 E j T V k R G l i u O K 5 U T D a e m x Y X 0 i G Q q o M Q 4 G T 4 Z Q w T 4 Q R Q s L b L D V 6 D 1 V q Y o u E A 0 k H b 8 C k 6 k / Y P c M a v I N i E r u G T 3 M 6 a h P s w o S R P t U I + h 5 O 2 / U u 3 3 X C N b N P 7 u + J 0 v 5 8 d 9 j 9 8 3 e 8 d H a / u s s 1 e s t f s H c v Y R 3 b E v r A T N m D A J L t h t + x n 5 3 f 0 K k q i t 0 s 0 6 q x 6 X r C 1 i N 7 / A d Y I w h Q = < / l a t e x i t > z6 < l a t e x i t s h a 1 _ b a s e 6 4 = " k p 7 6 a 4 b Z N W Z 1 7 i 6 c d 1 U y b p 7 s q K 0 = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 2 P N r C E S F Z x C B O w a 5 o 6 b G C C 8 d W I m 1 F N 4 r W 4 0 m z y j 6 s 3 X G J s f w Z e o W P x i f h y u Y h Q V J G W u m v m d + M Z v + T l 0 p 6 S t N f n W j r 3 v 0 H D 7 d 3 u o 8 e P 3 m 6 u 7 f / 7 N z b y g E O w C r r L n P h U U m D A 5 K k 8 L J 0 K H S u 8 C K f f p r X L 2 7 Q e W n N F 6 p L H G p x b e R Y g q C Q G i T f R 8 f J a K + X 9 t N F x H d F t h I 9 t o r T 0 X 7 n K y 8 s V B o N g R L e X 2 V p S c N G O J K g s O 3 y y m M p Y C q u 8 S p I I z T 6 Y b P Y t o 1 f h 0 w R j 6 0 L z 1 C 8 y P 7 b 0 Q j t f a 3 z Q G p B E 7 9 W m y 2 H b P J z 8 L 9 8 r t d Z T 1 q 4 2 h U b O 9 L 4 e N h I U 1 a E B p Y r j i s V k 4 3 n p s W F d A i k 6 i A E O B l + G c N E O A E U r O 1 y g 9 / A a i 1 M 0 X A A 6 a B t + B S d S f u H O O M 3 E G x C 1 / B J b m d N w n 2 Y U J K n W i G f w 0 n b / q X b b r h G t u n 9 X X F + 0 M + O + u / P D n o n H 1 d 3 2 W Y v 2 C v 2 l m X s A z t h n 9 k p G z B g k t 2 y H + x n 5 3 f 0 M k q i N 0 s 0 6 q x 6 n r O 1 i N 7 9 A d o k w h Y = < / l a t e x i t > z8 < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 F r p e I e F 1 v i 1 0 + G 4 A X R m E d 1 4 k c A = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S X i 0 c E Z J F D O I U 7 A g K x 6 p c O B a J t B X d K F q P J 8 0 q + 7 B 2 x 2 2 M 5 c / Q a / l o f B K u b B 4 S J G W k l f 6 a + c 1 o 9 j 9 5 q a S n N P 3 V i X b u 3 L 1 3 f / d B 9 + G j x 0 + e 7 u 0 / O / G 2 c o B D s M q 6 s 1 x 4 V N L g k C Q p P C s d C p 0 r P M 1 n n x f 1 0 0 t 0 X l r z j e o S R 1 p c G D m R I C i k h s m P 8 S A Z 7 / X S f r q M + L b I 1 q L H 1 n E 8 3 u 9 8 5 4 W F S q M h U M L 7 8 y w t a d Q I R x I U t l 1 e e S w F z M Q F n g d p h E Y / a p b b t v H r k C n i i X X h G Y q X 2 X 8 7 G q G 9 r 3 U e S C 1 o 6 j d q 8 9 W Q b X 4 B / p f P 9 S b r S Q t X u 2 J r R 5 p 8 G j X S l B W h g d W K k 0 r F Z O O F a X E h H Q K p O g g B T o Z f x j A V T g A F a 7 v c 4 B V Y r Y U p G g 4 g H b Q N n 6 E z a f 8 D z v k l B J v Q N X y a 2 3 m T c B 8 m l O S p V s g X c N K 2 f + m 2 G 6 6 R b X t / W 5 w M + t l B / / 3 X Q e / w a H 2 X X f a C v W J v W c Y + s k P 2 h R 2 z I Q M m 2 T W 7 Y T 8 7 v 6 O X U R K 9 W a F R Z 9 3 z n G 1 E 9 O 4 P z d D C E A = = < / l a t e x i t > z2 < l a t e x i t s h a 1 _ b a s e 6 4 = " m W C + J G p z R F 8 V 3 R Y 9 M r Y y f v 7 8 H y c = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S H m 3 h i J A s Y h C n Y J c W e q z g w r F I p K 3 o R t F 6 P G l W 2 Y e 1 O y 5 x L X + G X u G j 8 U m 4 s n l I k J S R V v p r 5 j e j 2 f / k p Z K e 0 v R X J 9 q 6 c / f e / e 0 H 3 Y e P H j / Z 2 d 1 7 e u p t 5 Q A H Y J V 1 5 7 n w q K T B A U l S e F 4 6 F D p X e J Z P P 8 3 r Z 1 f o v L T m K 9 U l D r W 4 N H I s Q V B I D Z L r 0 b t k t N t L + + k i 4 t s i W 4 k e W 8 X J a K / z j R c W K o 2 G Q A n v L 7 K 0 p G E j H E l Q 2 H Z 5 5 b E U M B W X e B G k E R r 9 s F l s 2 8 a v Q q a I x 9 a F Z y h e Z P / t a I T 2 v t Z 5 I L W g i V + r z Z Z D N v k 5 + F 8 + 1 + u s J y 1 c 7 Y q N H W l 8 N G y k K S t C A 8 s V x 5 W K y c Z z 0 + J C O g R S d R A C n A y / j G E i n A A K 1 n a 5 w e 9 g t R a m a D i A d N A 2 f I r O p P 1 D n P E r C D a h a / g k t 7 M m 4 T 5 M K M l T r Z D P 4 a R t / 9 J t N 1 w j 2 / T + t j j d 7 2 f v + w d f 9 n v H H 1 d 3 2 W b P 2 U v 2 h m X s A z t m n 9 k J G z B g k t 2 w H + x n 5 3 f 0 I k q i 1 0 s 0 6 q x 6 n r G 1 i N 7 + A c / e w h E = < / l a t e x i t > z 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " Y P E T O / y H F z d W d j 5 r G u L C z Q n r 6 4 I = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S H m 3 h i J A s Y h C n Y F e 0 c K z K h W O R S F v R j a L 1 e N K s s g 9 r d 9 z G W P 4 M v Z a P x i f h y u Y h Q V J G W u m v m d + M Z v + T l 0 p 6 S t N f n W j r 3 v 0 H D 7 c f d R 8 / e f p s Z 3 f v + a m 3 l Q M c g F X W n e f C o 5 I G B y R J 4 X n p U O h c 4 V k + / T y v n 1 2 h 8 9 K a b 1 S X O N T i 0 s i x B E E h N U h + j A 6 T 0 W 4 v 7 a e L i O + K b C V 6 b B U n o 7 3 O d 1 5 Y q D Q a A i W 8 v 8 j S k o a N c C R B Y d v l l c d S w F R c 4 k W Q R m j 0 w 2 a x b R u / C Z k i H l s X n q F 4 k f 2 3 o x H a + 1 r n g d S C J n 6 t N l s O 2 e T n 4 H / 5 X K + z n r R w t S s 2 d q T x p 2 E j T V k R G l i u O K 5 U T D a e m x Y X 0 i G Q q o M Q 4 G T 4 Z Q w T 4 Q R Q s L b L D V 6 D 1 V q Y o u E A 0 k H b 8 C k 6 k / Y P c M a v I N i E n P h U U m D A 5 K k 8 L J 0 K H S u 8 C K f f p r X L 2 7 Q e W n N F 6 p L H G p x b e R Y g q C Q G i T f R 8 f J a K + X 9 t N F x H d F t h I 9 t o r T 0 X 7 n K y 8 s V B o N g R L e X 2 V p S c N G O J K g s O 3 y y m M p Y C q u 8 S p I I z T 6 Y b P Y t o 1 f h 0 w R j 6 0 L z 1 C 8 y P 7 b 0 Q j t f a 3 z Q G p B E 7 9 W m y 2 H b P J z 8 L 9 8 r t d Z T 1 q 4 2 h U b O 9 L 4 e N h I U 1 a E B p Y r j i s V k 4 3 n p s W F d A i k 6 i A E O B l + G c N E O A E U r O 1 y g 9 / A a i 1 M 0 X A A 6 a B t + B S d S f u H O O M 3 E G x C 1 / B J b m d N w n 2 Y U J K n W i G f w 0 n b / q X b b r h G t u n 9 X X F + 0 M + O + u / P D n o n H 1 d 3 2 W Y v 2 C v 2 l m X s A z t h n 9 k p G z B g k t 2 y H + x n 5 3 f 0 M k q i N 0 s 0 6 q x 6 n r O 1 i N 7 9 A d o k w h Y = < / l a t e x i t > E = " > A A A C k H i c b Z F N j 9 M w E I b d 8 L W U r + 5 y R E I R L R K n K l n x s T c W u C B O i 0 R 3 V 9 R R Z T u T 1 q r t R P Z k a W T l x q / h C n + G f 4 P T V o J 2 G c n S q 5 n H o 5 l 3 e K W k w y T 5 3 Y t u 3 L x 1 + 8 7 B 3 f 6 9 + w 8 e P h o c H p 2 7 s r Y C J q J U p b 3 k z I G S B i Y o U c F l Z Y F p r u C C L z 9 0 9 Y s r s E 6 W 5 g s 2 F W S a z Y 0 s p G A Y U r P B 0 x H l 2 l M F B U 6 p Z r j g h d c t t X K + w K w d z Q b D Z J y s I 7 4 u 0 q 0 Y k m 2 c z Q 5 7 X 2 l e i l q D Q a G Y c 9 M 0 q T D z z K I U C t o + r R 1 U T C z Z H K Z B G q b B Z X 6 9 S B s / D 5 k 8 L k o b n s F 4 n f 3 3 h 2 f a u U b z Q H a z u p 3 a a t N k n + / A / / J c 7 7 I O N b O N z f d m x O I k 8 9 J U N Y I R m x G L W s V Y x p 2 f c S 4 t C F R N E E x Y G b a M x Y J Z J j C 4 3 q c G v o l S a 2 Z y T 4 W Q V r S e L s G a Z P w K V v R K B J v A e r E = " > A A A C k H i c b Z F N j 9 M w E I b d 8 L W U r + 5 y R E I R L R K n K l n x s T c W u C B O i 0 R 3 V 9 R R Z T u T 1 q r t R P Z k a W T l x q / h C n + G f 4 P T V o J 2 G c Y k m 2 c z Q 5 7 X 2 l e i l q D Q a G Y c 9 M 0 q T D z z K I U C t o + r R 1 U T C z Z H K Z B G q b B Z X 6 9 S B s / D 5 k 8 L k o b n s F 4 n f 3 3 h 2 f a u U b z Q H a z u p 3 a a t N k n + / A / / J c 7 7 I O N b O N z f d m x O I k 8 9 J U N Y I R m x G L W s V Y x p 2 f c S 4 t C F R N E E x Y G b a M x Y J Z J j C 4 3 q c G v o l S a 2 Z y T 4 W Q V r S e L s G a Z P w K V v R K B J v A e r i x B E E h N U h m o 8 N k t N N L + + k i 4 r s i W 4 k e W 8 X J a L f z g x c W K o 2 G Q A n v L 7 O 0 p G E j H E l Q 2 H Z 5 5 b E U M B V X e B m k E R r 9 s F l s 2 8 Z v Q 6 a I x 9 a F Z y h e Z G 9 3 N E J 7 X + s 8 k F r Q x K / V Z s s h m / w c v J f P 9 T r r S Q t X u 2 J j R x p / G T b S l B W h g e W K 4 0 r F Z O O 5 a X E h H Q K p O g g B T o Z f x j A R T g A F a 7 v c 4 E + w W g t T N B x A O m g b P k V n 0 v C g n q t X 3 C Z h 4 7 M / O o H p x 5 F W 2 y A = " > A A A C d X i c b Z F N b 9 N A E I Y 3 L p Q S o L R w R E h W Y x C n Y F f Q c q z g w r F I p K 3 a j a L 1 e N K s s h / W 7 r i N Z f k 3 9 A o / j V / C l c 2 H B E k Z a a V X M 8 + M Z t / J S y U 9 p e m v T r T 1 4 O H 2 o 5 3 H 3 S d P n + 0 + 3 9 t / c e Z t 5 Q A H Y J V 1 F 7 n w q K T B A U l S e F E 6 F D p X e J 5 P v 8 z r 5 z f o v L T m O 9 U l D r W 4 N n I s Q V B I D Z L Z 6 D g Z 7 f X S f r q I + L 7 I V q L H V n E 6 2 u 9 c 8 s J C p d E Q K O H 9 V Z a W N G y E I w k K 2 y 6 v P J Y C p u I a r 4 I 0 Q q M f N o t t 2 / h N y B T x 2 L r w D M W L 7 L 8 d j d D e 1 z o P p B Y 0 8 W u 1 2 X L I J j 8 H / 8 v n e p 3 1 p I W r X b G x I 4 0 / D R t p y o r Q w H L F c a V i s v H c t L i Q D o F U H Y Q A J 8 M v Y 5 g I J 4 C C t V 1 u 8 B a s 1 s I U D Q e Q D t q G T 9 G Z t P 8 R Z / w G g k 3 o G j 7 J 7 a x J u A 8 T S v J U K + R z O G Q G i Q 3 o / 1 k t N N L + + k i 4 r s i W 4 k e W 8 X J a L f z n R c W K o 2 G Q A n v L 7 K 0 p G E j H E l Q 2 H Z 5 5 b E U M B W X e B G k E R K B A q K s / v Y = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S H m 3 h i J A s Y h C n Y F e 0 5 V i V C 8 c i k b a i G 0 X r 8 a R Z Z R / W 7 r j E W P 4 M v Z a P x i f h y u Y h Q V J G W u m v m d + M Z v + T l 0 p 6 S t N f n W j r 3 v 0 H D 7 c f d R 8 / e f p s Z 3 f v + Z m 3 l Q M c g F X W X e T C o 5 I G B y R J 4 U X p U O h c 4 X k + / T S v n 1 + j 8 9 K a r 1 S X O N T i y s i x B E E h N U h + j I 6 S 0 W 4 v 7 a e L i O + K b C V 6 b B W n o 7 3 O N 1 5 Y q D Q a A i W 8 v 8 z S k o a N c C R B Y d v l l c d S w F R c 4 W W Q R m j 0 w 2 a x b R u / C Z k i H l s X n q F 4 k f 2 3 o x H a + 1 r n g d S C J n 6 t N l s O 2 e T n 4 H / 5 X K + z n r R w t S s 2 d q T x x 2 E j T V k R G l i u O K 5 U T D a e m x Y X 0 i G Q q o M Q 4 G T 4 Z Q w T 4 Q R Q s L b L D X 4 H q 7 U w R c M B p I O 2 4 V N 0 J u 0 f 4 I x f Q 7 A J X c M n u Z 0 1 C f d h Q k m e a o V Q G i Q 3 o / 1 k t N N L + + k i 4 r s i W 4 k e W 8 X J a L f z n R c W K o 2 G Q A n v L 7 K 0 p G E j H E l Q 2 H Z 5 5 b E U M B W X e B G k E R K B A q K s / v Y = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 S H m 3 h i J A s Y h C n Y F e 0 5 V i V C 8 c i k b a i G 0 X r 8 a R Z Z R / W 7 r j E W P 4 M v Z a P x i f h y u Y h Q V J G W u m v m d + M Z v + T l 0 p 6 S t N f n W j r 3 v 0 H D 7 c f d R 8 / e f p s Z 3 f v + Z m 3 l Q M c g F X W X e T C o 5 I G B y R J 4 U X p U O h c 4 X k + / T S v n 1 + j 8 9 K a r 1 S X O N T i y s i x B E E h N U h + j I 6 S 0 W 4 v 7 a e L i O + K b C V 6 b B W n o 7 3 O N 1 5 Y q D Q a A i W 8 v 8 z S k o a N c C R B Y d v l l c d S w F R c 4 W W Q R m j 0 w 2 a x b R u / C Z k i H l s X n q F 4 k f 2 3 o x H a + 1 r n g d S C J n 6 t N l s O 2 e T n 4 H / 5 X K + z n r R w t S s 2 d q T x x 2 E j T V k R G l i u O K 5 U T D a e m x Y X 0 i G Q q o M Q 4 G T 4 Z Q w T 4 Q R Q s L b L D X 4 H q 7 U w R c M B p I O 2 4 V N 0 J u 0 f 4 I x f Q 7 A J X c M n u Z 0 1 C f d h Q k m e a o V Q C H Y J V 1 l 4 X w q K T B I U l S e F k 5 F L p Q e F H M P y 3 r F 9 f o v L T m K z U V j r W 4 M n I q Q V B M j f q 8 E C 4 s 2 k n e n x z 0 s k G 2 i v S 2 y D e i x z Z x N j n s f O O l h V q j I V D C + 1 G e V T Q O w p E E h W 2 X 1 x 4 r A X N L I R H J Q 2 O S J L C y 8 q h 0 I X C i 2 L x c V W / u E b n p T V f q K l w o s W V k T M J g m J q 3 O e F c G H Z T o f 9 6 V E v G 2 T r S G + K f C t 6 b B t n 0 + P O V 1 5 a q D U a A i W 8 H + d Z R Z M g H E l Q 2 H Z 5 7 b E S s B B X O I 7 S C I 1 + E t Y 7 t + m L m C n T m X X x G U P i r F Y p 2 X R l X V p K h 0 C q i U K A k / G X K c y F E 0 D R 4 C 4 3 + A 2 s 1 s K U g Q N I B 2 3 g C 3 Q m G w x x y a 8 h 2 o Q u 8 H l h l 6 H P f Z x Q k a d G I V / B G v 7 V z u 9 Z J A s I r 4 t 0 p X o s V W c X e 1 2 v v K i h F q j I V D C u V G a V J R 5 Y U m C w r b L a 4 e V g K m 4 x l G Q R m h 0 m V + s 3 c Y v Q q a I x 6 U N z 1 C 8 y P 7 b 4 Y V 2 r t F 5 I L W g i V u r z Z Z D N v k 5 + F 8 + 1 + u s I y 1 s Y 4 u N H W l 8 k n l p q p r Q w H L F c a 1 i K u O 5 e 3 E h L Q K p J g g B V o Z f x j A R V g A F j 7 v c G v 7 V z u 9 Z J A s I r 4 t 0 p X o s V W c X e 1 2 v v K i h F q j I V D C u V G a V J R 5 Y U m C w r b L a 4 e V g K m 4 x l G Q R m h 0 m V + s 3 c Y v Q q a I x 6 U N z 1 C 8 y P 7 b 4 Y V 2 r t F 5 I L W g i V u r z Z Z D N v k 5 + F 8 + 1 + u s I y 1 s Y 4 u N H W l 8 k n l p q p r Q w H L F c a 1 i K u O 5 e 3 E h L Q K p J g g B V o Z f x j A R V g A F j 7 v c G v 7 V z u 9 Z J A s I r 4 t 0 p X o s V W c X e 1 2 v v K i h F q j I V D C u V G a V J R 5 Y U m C w r b L a 4 e V g K m 4 x l G Q R m h 0 m V + s 3 c Y v Q q a I x 6 U N z 1 C 8 y P 7 b 4 Y V 2 r t F 5 I L W g i V u r z Z Z D N v k 5 + F 8 + 1 + u s I y 1 s Y 4 u N H W l 8 k n l p q p r Q w H L F c a 1 i K u O 5 e 3 E h L Q K p J g g B V o Z f x j A R V g A F j 7 v c 4 H c o t R a m 8 B x A W m g 9 n 6 I 1 y e A d z v g N B J v Q e j 7 J y 5 n v c x c m V O S o U c j n c L 9 t / 9 J t N 1 w j 3 f T + t j g / G q T v B 2 8 / H / V O P 6 z u s s 2 e s e f s F U v Z M T t l n 9 g Z G z J g l v 1 g P 9 m v i E U v o z f R 4 R K N O q u e f b Y W 0 c k f T r n E d Q = = < / l a t e x i t > [ ⇤ ] xi ≜ x i if m i = 1 * if m i = 0 (i ∈ {1, . . . , d}) where * denotes the unobserved value. Missingness mechanisms. Missing values occur due to various reasons. To simulate different scenarios, following the prior work (Yoon et al., 2019; Jarrett et al., 2022) , we consider three missingness mechanisms: MCAR ("missing completely at random") -the missingness does not depend on the data, which indicates that ∀m, x, x ′ , p(m|x) = p(m|x ′ ); MAR ("missing at random") -the missingness depends on the observed values, which indicates that ∀m, x, x ′ , if the observed values of x and x ′ are the same, then p(m|x) = p(m|x ′ ); and MNAR ("missing not at random")the missingness depends on the missing values as well, which is the case if the definitions of MCAR and MAR do not hold. In general, it is impossible to identify the missingness distribution of MNAR without domain-specific assumptions or constraints (Ma & Zhang, 2021) . Imputation task. In this task, we are given an incomplete dataset D ≜ {(x (i) , m (i) )} n i=1 ,foot_0 which consists of n i.i.d. realizations of x and m. The goal is to recover the missing values of each input x by generating an imputed version x ≜ (x 1 , . . . , xd ) such that xi ≜ xi if m i = 1 xi if m i = 0 (i ∈ {1, . . . , d}) (2) where xi is the imputed value.

3.2. DESIGN OF REMASKER

The REMASKER imputer extends the masked autoencoding (MAE) framework (Dosovitskiy et al., 2020; Bao et al., 2022; He et al., 2022) that reconstructs masked components based on observed components. As illustrated in Figure 1 , REMASKER comprises an encoder that maps the observed values to their representations and a decoder that reconstructs the masked values from the latent representations. However, unlike conventional MAE, as the data in the imputation task is inherently incomplete (i.e., naturally masked), we employ a "re-masking" approach that explicitly accounts for this incompleteness in applying masking and reconstruction. At a high level, REMASKER works in two phases: fitting -it optimizes the model with respect to the given dataset, and imputation -it applies the trained model to predict the missing values of the dataset. Re-masking. In the fitting phase, for each input x, in addition to its missing values, we also randomly select and mask out another subset (e.g., 25%) of x's values. Formally, letting m be x's mask, we Algorithm 1: REMASKER define another mask vector m ′ ∈ {0, 1} d , which is randomly sampled without replacement, following a uniform distribution. Apparently, m and m ′ entail three subsets: Input: D = {(x (i) , m (i) ) I mask = {i|m i = 0} I remask = {i|m i = 1 ∧ m ′ i = 0} I unmask = {i|m i = 1 ∧ m ′ i = 1} Let xm , xm∧m ′ , and xm∧m ′ respectively be the masked, re-masked, and unmasked values. With a sufficient number of re-masked values, in addition to the missing values, we create a challenging task that encourages the model to learn missingness-invariant representations (more details in § 5). Note that in the imputation phase, we do not apply re-masking. Encoder. The encoder embeds each value using an encoding function and processes the resulting embeddings through a sequence of Transformer blocks. In implementation, we apply linear encoding function to each value x: enc(x) = wx + b, where w and b are learnable parameters. 2 We also add positional encoding to x's embedding to force the model to memorize x's position in the input (e.g., the k-th feature): pe(k, 2i) = sin(k/10000 2i/d ), where k and i respectively denote x's position in the input and the dimension of the embedding, and d is the embedding width. Note that the encoder is only applied to the observed values: in the fitting phase, it operates on the observed values after re-masking (i.e., the unmasked set I unmask ); in the imputation phase, it operates on the non-missing values (i.e., the union of re-masked and unmasked sets I unmask ∪ I remask ), as illustrated in Figure 1 . Decoder. The REMASKER decoder is instantiated as a sequence of Transformer blocks followed by an MLP layer. Different from the encoder, the decoder operates on the embeddings of both observed and masked values. Following (Devlin et al., 2018; He et al., 2022) , we use a shared, learnable mask token as the initial embedding of each masked value. The decoder first adds positional encoding to the embeddings of all the values (observed and masked), processes the embeddings through a sequence of Transformer blocks, and finally applies linear projection to map the embeddings to scalar values as the predictions. Similar to (He et al., 2022) , we use an asymmetric design with a deep encoder and a shallow decoder (e.g., 8 blocks versus 4 blocks), which often suffices to re-construct the masked values. Conventional MAE focuses on representation learning and uses the decoder only in the training phase. In REMASKER, the decoder is required to re-construct the missing values and is thus used in both fitting and imputation phases. Reconstruction loss. Recall that the REMASKER decoder predicts the value for each input feature. We define the reconstruction loss functions as the mean square error (MSE) between the reconstructed and original values on (i) the re-masked set I remask and (ii) unmasked set I unmask . We empirically experiment with different reconstruction loss functions (e.g., only the re-masked set or both re-masked and unmasked sets). Putting everything together, Algorithm 1 sketches the implementation of REMASKER.

4. EVALUATION

We evaluate the empirical performance of REMASKER in various scenarios using benchmark datasets. Our experiments are designed to answer the following key questions: (i) Does REMASKER work? -We compare REMASKER with a variety of state-of-the-art imputers in terms of imputation quality. (ii) How does it work? -We conduct an ablation study to assess the contribution of each component of REMASKER to its performance. (iii) What is the best way of using REMASKER? -We explore the use of REMASKER as a standalone imputer as well as one component of an ensemble imputer to understand its best practice. Datasets. For reproducibility and comparability, similar to the prior work (Yoon et al., 2019; Jarrett et al., 2022) , we use 12 real-world datasets from the UCI Machine Learning repository (Dua & Graff, 2017) with their characteristics deferred to Appendix § A.1. Missing mechanisms. We consider three missingness mechanisms. In MCAR, the mask vector of each input is realized following a Bernoulli random variable with a fixed mean. In MAR, with a random subset of features fixed to be observable, the remaining features are masked using a logistic model. In MNAR, the input features of MAR are further masked following a Bernoulli random variable with a fixed mean. We use the HyperImpute platform (Jarrett et al., 2022) to simulate the above missing mechanisms. Baselines. We compare REMASKER with 13 state-of-the-art imputation methods: HyperImpute (Jarrett et al., 2022), a hybrid imputer that performs iterative imputation with automatic model selection; MIWAE (Mattei & Frellsen, 2018) , an autoencoder model that fits missing data by optimizing a variational bound; EM (García-Laencina et al., 2010) , an iterative imputer based on expectationmaximization optimization; GAIN (Yoon et al., 2019) , a generative adversarial imputation network that trains the discriminator to classify the generator's output in an element-wise manner; ICE, an iterative imputer based on regularized linear regression; MICE, an ICE-like, iterative imputer based on Bayesian ridge regression; MIRACLE (Kyono et al., 2021) , an iterative imputer that refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism; MissForest (Stekhoven & Buhlmann, 2012) , an iterative imputer based on random forests; Mean (Hawthorne & Elliott, 2005) , Median, and Frequent, which impute missing values using column-wise unconditional mean, median, and the most frequent values, respectively; Sinkhorn (Muzellec et al., 2020) , an imputer trained through the optimal transport metrics of Sinkhorn divergences; and SoftImpute (Hastie et al., 2015) , which performs imputation through soft-thresholded singular value decomposition. Metrics. For each imputation method, we evaluate the imputation fidelity and utility by comparing its imputed data with the ground-truth data. In terms of fidelity, we mainly use two metrics: root mean square error (RMSE) to measure how the individual imputed values match the ground-truth data, and the Wasserstein distance (WD) to measure how the imputed distribution matches the ground-truth distribution. In terms of utility, we use area under the receiver operating characteristic curve (AUROC) as the metric on applicable datasets (i.e., ones associated with classification tasks). In the case of multi-class classification, we use the one versus rest (OvR) setting. To be fair, we use logistic regression as the predictive model across all the cases.

4.1. OVERALL PERFORMANCE

We evaluate REMASKER and baseline imputers on the benchmark datasets under the MAR setting with 0.3 missingness ratio, with results summarized in Figure 2 . Observe that REMASKER consistently outperforms all the baselines in terms of both fidelity (measured by RMSE and WD) and utility (measured by AUROC) across all the datasets. Recall that the benchmark datasets are collected from a variety of domains with highly varying characteristics (cf. Table 5 ): the dataset size varies from 308 to 20,060, while the number of features ranges from 7 to 57. Its superior performance across all the datasets demonstrates that REMASKER effectively models the intricate correlation among different features, even if the amount of available data is scarce. The only imputer with performance close to REMASKER is HyperImpute (Jarrett et al., 2022) , which is an ensemble method that integrates multiple imputation models and automatically selects the most fitting model for each column of the given dataset. This highlights that the modeling capacity of REMASKER's masked autoencoder is comparable with ensemble models. In Appendix § B.1, we conduct a more comprehensive evaluation by simulating all three missingness scenarios (MCAR, MAR, and MNAR) with different missingness ratios. The results show that REMASKER consistently performs better across a range of settings.

4.2. SENSITIVITY ANALYSIS

To assess the factors influencing REMASKER's performance, we conduct sensitivity analysis by varying the dataset size, the number of features in the dataset, and the missingness ratio under the MAR setting. Figure 3 shows the performance of REMASKER within these experiments against the six closest competitors (HyperImpute, ICE, MissForest, GAIN, MIWAE, and Sinkhorn) on the letter dataset. We have the following observations. (a) The performance of REMASKER improves with the size of available data, while its advantage over other imputers (with the exception of HyperImpute) grows with the dataset size. (b) The number of features has a significant impact on the performance of REMASKER, with its advantage over other imputers increasing steadily with the number of features. This may be explained by that REMASKER relies on learning the holistic representations of inputs, while including more features contributes to better representation learning. (c) REMASKER is fairly insensitive to the missingness ratio. For instance, even with 0.7 missingness ratio, it achieves RMSE below 0.1, suggesting that it effectively fits sparse datasets. In Appendix § B.2, we also conduct an evaluation on other datasets with similar observations.

4.3. ABLATION STUDY

We further conduct an ablation study of REMASKER to understand the contribution of different components to its performance using the letter dataset. The results on other datasets are deferred to Appendix § B.3. Model design. The encoder and decoder of REMASKER can be flexibly designed. Here, we study the impact of three key parameters, the encoder depth (the number of Transformer blocks in the encoder), the embedding width (the dimensionality of latent representations), and the decoder depth, with results summarized in Table 1a, Table 1b, and Table 1c , respectively. Observe that the performance of REMASKER reaches its peak with a proper model configuration (encoder depth = 8, decoder depth = 8, and embedding width = 64). This observation suggests that the model complexity needs to fit the given dataset: it needs to be sufficiently complex to effectively learn the holistic representations of inputs but not overly complex to overfit the dataset. We also compare the performance of REMASKER with different backbone models (i.e., Transformer, linear, and convolutional) with the number of layers and the size of each layer fixed as the default setting. As shown in Table 2 , Transformer-based REMASKER largely outperforms the other variants, which may be explained by that the self-attention mechanism of Transformer is able to effectively capture the intricate inter-feature correlation under limited data Huang et al. (2020) . Reconstruction loss. We define the reconstruction loss as the error between the reconstructed and original values on the re-masked values I mask+ and the unmasked values I unmask . We measure the performance of REMASKER under three different settings of the construction loss: (i) I mask+ ∪ I unmask , (ii) I mask+ only, and (iii) I unmask only on the letter and california datasets, with results shown in Table 3 . Observe that using the reconstruction of unmasked values only is insufficient and yet including the reconstruction loss of unmasked values improves the performance, which is especially the case on the california dataset. This finding is different from the vision domain in which computing the loss on unmasked image patches reduces accuracy (He et al., 2022) . We hypothesize that this difference is explained as follows. Unlike conventional MAE, due to the naturally missing values in tabular data, relying on re-masked values provides limited supervisory signals. Moreover, while images are signals with heavy spatial redundancy (i.e., a missing patch can be recovered from its neighboring patches), tabular data tends to be highly semantic and informationdense. Thus, including the construction loss of unmasked values improves the model training.

4.4. PRACTICE OF REMASKER

Finally, we explore the optimal practice of REMASKER. Training regime. The ablation study above by default uses 600 training epochs. Figure 4 (a) shows the impact of training epochs, in which we vary the training epochs from 100 to 1,600 and measure the performance of REMASKER on the letter dataset. Observe that the imputation performance improves (as RMSE and WD decrease and AUROC increases) steadily with longer training and does not fully saturate even at 1,600 epochs. However, for efficient training, it is often acceptable to terminate earlier (e.g., 600 epochs) with sufficient imputation performance. To further validate the trainability of REMASKER, with the maximum number of training epochs fixed at 600 (which affects the learning rate scheduler), we measure the reconstruction loss as a function of the training epochs. As shown in Figure 4 (b), the loss quickly converges to a plateau within about 100 epochs and steadily decreases after that, demonstrating the trainability of REMASKER. Masking ratio. The masking ratio controls the number of re-masked values (after excluding missing values). Table 4a shows its impact on the performance of REMASKER. Observe that the optimal ratio differs across different datasets, which may be explained by the varying number of features of different datasets (16 versus 9 in letter and california). Intuitively, a larger number of features affords a higher masking ratio to balance (i) encouraging the model to learn missingnessinvariant representations and (ii) having sufficient supervisory signals to facilitate the training. Standalone vs. ensemble. Besides using REMASKER as a standalone imputer, we explore its use as a base imputer within the ensemble imputation framework of HyperImpute, with results summarized in Table 4b . It is observed that compared with the default setting (with mean substitution as the base imputer), using REMASKER as the base imputer improves the imputation performance, suggesting another effective way of operating REMASKER.

5. DISCUSSION

The empirical evaluation above shows REMASKER's superior performance in imputing missing values of tabular data. Next, we provide theoretical justification for its effectiveness. By extending the siamese form of MAE (Kong & Zhang, 2022) , we show that REMASKER encourages learning missingness-invariant representations of input data, which requires a holistic understanding of the data even in the presence of missing values. Let f θ (•) and d ϑ (•) respectively be the encoder and decoder. For given input x, mask m, and re-mask m ′ , the reconstruction loss of REMASKER training is given by (here we focus on the reconstruction of re-masked values): ℓ(x, m, m ′ ) = ∥d ϑ (f θ (x ⊙ m ⊙ m ′ )) ⊙ (1 -m ′ ) ⊙ m -x ⊙ (1 -m ′ ) ⊙ m∥ 2 where ⊙ denotes element-wise multiplication. Let m + ≜ m ⊙ m ′ and m -≜ m ⊙ (1 -m ′ ). Eq (3) can be simplified as: ℓ(x, m + , m -) = ∥d ϑ (f θ (x ⊙ m + )) ⊙ m --x ⊙ m -∥ 2 . As the embedding dimensionality is typically much larger than the number of features, it is possible to make the autoencoder lossless. In other words, for a given encoder f θ (•), there exists a decoder d ϑ ′ (•), such that d ϑ ′ (f θ (x ⊙ m -)) ⊙ m -≈ x ⊙ m -. We can further re-write Eq (3) as: ℓ(x, m + , m -) = ∥d ϑ (f θ (x ⊙ m + )) ⊙ m --d ϑ ′ (f θ (x ⊙ m -)) ⊙ m -∥ 2 s.t. ϑ ′ = arg min ϑ ′ E x ′ ∥d ϑ ′ (f θ (x ′ ⊙ m -)) ⊙ m --x ′ ⊙ m -∥ 2 (4) We now define a new distance metric ∆ ϑ,ϑ ′ (z, z ′ ) ≜ ∥(d ϑ (z) -d ϑ ′ (z ′ )) ⊙ m -∥ 2 . Then, Eq (3) is reformulated as: ℓ(x, m + , m -) = ∆ ϑ,ϑ ′ (f θ (x ⊙ m + ), f θ (x ⊙ m -)) s.t. ϑ ′ = arg min ϑ ′ E x ′ ∥d ϑ ′ (f θ (x ′ ⊙ m -)) ⊙ m --x ′ ⊙ m -∥ 2 Note that optimizing Eq (5) essentially minimizes the difference between x's representations under m + and m -(with respect to the decoder). As m + and m -mask out different values, this formulation promotes learning representations insensitive to missing values. To validate the analysis above, we empirically measure the CKA similarity (Kornblith et al., 2019) between the latent representations (i.e., the output of REMASKER's encoder) of complete inputs and inputs with missing values, with results shown in Figure 5 . Observe that the CKA measures under different missingness ratios all steadily increase with the training length, indicating that REMASKER tends to learn missingness-invariant representations of tabular data, which may explain for its imputation effectiveness.

6. CONCLUSION

In this paper, we conduct a pilot study exploring the masked autoencoding approach for tabular data imputation. We present REMASKER, a novel imputation method that learns missingness-invariant representations of tabular data and effectively imputes missing values under various scenarios. With extensive evaluation on benchmark datasets, we show that REMASKER outperforms state-of-the-art methods in terms of both imputation utility and fidelity. Our findings indicate that masked tabular modeling represents a promising direction for future research on tabular data imputation. 6 , 7, and 8 respectively show the imputation performance of REMASKER and 8 baselines on 12 benchmark datasets under the MAR, MCAR, and MNAR scenarios with the missingness ratio varying from 0.1 to 0.7. Observed that REMASKER performs on par with or outperforms almost all the baselines across a wide range of settings. Note that the MIRACLE imputer does not work on the Compression dataset and the Raisin dataset under some settings, of which the results are not reported. Given that both Compression and Raisin are relatively small datasets, one possible explanation is that MIRACLE requires a sufficient amount of data to train the model. Why does REMASKER generalize across the settings of MAR, MCAR, and MNAR? One possible explanation is as follows. Recall that in MCAR, the mask vector of each input is realized following a Bernoulli random variable with a fixed mean; in MAR, with a random subset of features fixed to be observable, the remaining features are masked using a logistic model; in MNAR, the input features of MAR are further masked following a Bernoulli random variable with a fixed mean. Regardless of the missingness mechanism, it is rare that the values of one feature x are missing across all the records. Thus, by its design, REMASKER is able to learn to re-construct feature x i conditional on other features xī = (x 1 , . . . , x i-1 , x i+1 , . . . , x d ). Yet, as reflected in the imputation results, the learning to re-construct performs better under MCAR, in which the missing values are evenly distributed across different features, than MAR or MNAR, in which the missing values are not evenly distributed. 

B.2 SENSITIVITY ANALYSIS

Figure 9 shows the sensitivity analysis of REMASKER and other 6 baselines on the california dataset under the MAR, MCAR, and MNAR settings. The observed trends are generally similar to that in Figure 3 , which further demonstrates the observations we made in § 4 about how different factors may impact REMASKER's imputation performance.

B.3 ABLATION STUDY

The ablation study of REMASKER on the california dataset is shown in Table 7 . Observed that the performance of REMASKER reaches its peak with encoder depth = 6, decoder depth = 4, and embedding width = 32. 



Without ambiguity, we omit the superscript i in the following notations. We have explored other encoding functions including periodic activation function(Gorishniy et al., 2022), which observes a slight decrease (e.g., ∼ 0.01 RMSE) in imputation performance.



z8 < l a t e x i t s h a 1 _ b a s e 6 4 = " N + z M C e A s w t a w C n z b f 9 I 4 b V d b b g

r g 5 c q P q A s d K n T Y K K A d P G r b v 3 T b D 9 d I 9 7 2 / L s 6 P x + n r 8 c v P x 8 P T 9 9 u 7 H J A n 5 B l 5 Q V L y h p y S j + S M T I g g 3 8 k P 8 p P 8 i o 6 i k + h t 9 G 6 D R r 3 t n 8 d k J 6 J P f w A U k s z A < / l a t e x i t > [m] < l a t e x i t s h a 1 _ b a s e 6 4 = " N + z M C e A s w t a w C n z b f 9 I 4 b V d b b g

r g 5 c q P q A s d K n T Y K K A d P G r b v 3 T b D 9 d I 9 7 2 / L s 6 P x + n r 8 c v P x 8 P T 9 9 u 7 H J A n 5 B l 5 Q V L y h p y S j + S M T I g g 3 8 k P 8 p P 8 i o 6 i k + h t 9 G 6 D R r 3 t n 8 d k J 6 J P f w A U k s z A < / l a t e x i t > [m] < l a t e x i t s h a 1 _ b a s e 6 4 = " N + z M C e A s w t a w C n z b f 9 I 4 b V d b b g

r g 5 c q P q A s d K n T Y K K A d P G r b v 3 T b D 9 d I 9 7 2 / L s 6 P x + n r 8 c v P x 8 P T 9 9 u 7 H J A n 5 B l 5 Q V L y h p y S j + S M T I g g 3 8 k P 8 p P 8 i o 6 i k + h t 9 G 6 D R r 3 t n 8 d k J 6 J P f w A U k s z A < / l a t e x i t > [m] < l a t e x i t s h a 1 _ b a s e 6 4 = " N + z M C e A s w t a w C n z b f 9 I 4 b V d b b g

7 r t h m t k m 9 7 f F 2 e H / e y o / + H b Y e / k 8 + o uO + w V O 2 D v W M Y + s R P 2 l Z 2 y A Q M m 2 R 3 7 w X 5 2 f k e v o yR 6 u 0 S j z q r n J V u L 6 P 0 f 1 g T C F A = = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " 8 F r p e I e F 1 v i 1 0

r u G T 3 M 6 a h P s w o S R P t U I + h 5 O 2 / U u 3 3 X C N b N P 7 u + J 0 v 5 8 d 9 j 9 8 3 e 8 d H a / u s s 1 e s t f s H c v Y R 3 b E v r A T N m D A J L t h t + x n 5 3 f 0 K k q i t 0 s 0 6 q x 6 X r C 1 i N 7 / A d Y I w h Q = < / l a t e x i t > z6 < l a t e x i t s h a 1 _ b a s e 6 4 = " k p 7 6 a 4 b Z N W Z 1 7 i 6 c d 1 U y b p 7 s q K 0 = " > A A A C d X i c b Z F L b 9 N A E M c 3 L o 8 2 P N r C E S F Z x C B O w a 5 o 6 b G C C 8 d W I m 1 F N 4 r W 4 0 m z y j 6 s 3 X G J s f w Z e o W P x i f h y u Y h Q V J G W u m v m d + M Z v + T l 0 p 6 S t N f n W j r 3 v 0 H D 7 d 3 u o 8 e P 3 m 6 u 7 f / 7 N z b y g E O w C r r L

z8 < l a t e x i t s h a 1 _ b a s e 6 4 = " N + z M C e A s w t a w C n z b f 9 I 4 b V d b b g

r g 5 c q P q A s d K n T Y K K A d P G r b v 3 T b D 9 d I 9 7 2 / L s 6 P x + n r 8 c v P x 8 P T 9 9 u 7 H J A n 5 B l 5 Q V L y h p y S j + S M T I g g 3 8 k P 8 p P 8 i o 6 i k + h t 9 G 6 D R r 3 t n 8 d k J 6 J P f w A U k s z A < / l a t e x i t > [m] < l a t e x i t s h a 1 _ b a s e 6 4 = " N + z M C e A s w t a w C n z b f 9 I 4 b V d b b g

n S q 5 n H o 5 l 3 e K W k w y T 5 3 Y t u 3 L x 1 + 8 7 B 3 f 6 9 + w 8 e P h o c H p 2 7 s r Y C J q J U p b 3 k z I G S B i Y o U c F l Z Y F p r u C C L z 9 0 9 Y s r s E 6 W 5 g s 2 F W S a z Y 0 s p G A Y U r P B 0 x H l 2 l M F B U 6 p Z r j g h d c t t X K + w K w d z Q b D Z J y s I 7 4 u 0 q 0

r g 5 c q P q A s d K n T Y K K A d P G r b v 3 T b D 9 d I 9 7 2 / L s 6 P x + n r 8 c v P x 8 P T 9 9 u 7 H J A n 5 B l 5 Q V L y h p y S j + S M T I g g 3 8 k P 8 p P 8 i o 6 i k + h t 9 G 6 D R r 3 t n 8 d k J 6 J P f w A U k s z A < / l a t e x i t > [m] < l a t e x i t s h a 1 _ b a s e 6 4 = " n 8 V y k v H y 4 r O S I 9 H S 5 d i c P / P c h q A = " > A A A C d X i c b Z F L b 9 N A E M c 3 p k A J r x a O V S W L G M Q p 2 F V 5 H C u 4 9 F i k p q 3 o R t F 6 P G l W 2 Y e 1 O y 6 x L H 8 G r v D R + C R c 2 T y k N m l H W u m v m d + M Z v + T l 0 p 6 S t O / n e j B 1 s N H j 7 e f d J 8 + e / 7 i 5 c 7 u q z N v K w c 4 A K u s u 8 i F R y U N D k i S w o v S o d C 5 w v N 8 + m 1 e P 7 9 G 5 6 U 1 p 1 S X O N T i y s

5 H n P F r C D a h a / g k t 7 M m 4 T 5 M K M l T r Z D P 4 a R t b + i 2 G 6 6 R b X p / V 5 w d 9 L N P / c P v B 7 2 j r 6 u 7 b L M 9 9 o a 9 Z x n 7 z I 7 Y M T t h A w Z M s l / s N / v T + R f t R 0 n 0 b o l G n V X P a 7 Y W 0 Y f / z c z C E A = = < / l a t e x i t > x4 < l a t e x i t s h a 1 _ b a s e 6 4 = " n z

n b v 3 T b D d f I N r 2 / L 8 4 O + 9 l R / 8 O 3 w 9 7 J 5 9 V d d t g r d s D e s Y w d s x P 2 l Z 2 y A Q M m 2 R 3 7 w X 5 2 f k e v o y R 6 u 0 S j z q r n J V u L 6 P 0 f 0 / b C E w = = < / l a t e x i t > x7 < l a t e x i t s h a 1 _ b a s e 6 4 = " r I V V R a G g d P S N Q 0 h 2 Z C 3 M K 1 v t 9 p 0 = " > A A A C d X i c b Z F L b 9 N A E M c 3 5 l X C o y 0 c E Z J F D O I U 7 K q 0 H K t y 4 V g k 0 l Z 0 o 2 g 9 n j S r 7 M P a H b d x L X 8 G r u W j 8 U m 4 s n l I k J S R V v p r 5 j e j 2 f / k p Z K e 0 v R X J 7 p 3 / 8 H D R 1 u P u 0 + e P n u + v b P 7 4 t T b y g E O w C r r z n P h U U m D A 5 K k 8 L x 0 K H S u 8 C y f f p 7 X z 6 7 Q e W n N N 6 p L H G p x a e R Y g q C

r 9 s F l s 2 8 Z v Q 6 a I x 9 a F Z y h e Z P / t a I T 2 v t Z 5 I L W g iV + r z Z Z D N v k 5 + F 8 + 1 + u s J y 1 c 7 Y q N H W n 8 a d h I U 1 a E B p Y r j i s V k 4 3 n p s W F d A i k 6 i A E O B l + G c N E O A E U r O 1 y g 9 d g t R a m a D i A d N A 2 f I r O p P 2 P O O N X E G x C 1 / B J b m d N w n 2 Y U J K n W i G f w 0 n b / q X b b r h G t u n 9 X X G 6 1 8 8 O + v t f 9 3 p H x 6 u 7 b L F X 7 A 1 7 z z J 2 y I 7 Y F 3 b C B g y Y Z D / Y L f v Z + R2 9 j p L o 3 R K N O q u e l 2 w t o g 9 / A N H s w h I = < / l a t e x i t > z 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " a 7 9 G M m I e o r D z n p v 3 o 6

8 D i d t + 5 d u u + E a 2 a b 3 d 8 X Z f j 8 7 7 H / 4 s t 8 7 P l n d Z Z u 9 Z K / Z O 5 a x I 3 b M P r N T N m D A J L t h t + x n 5 3 f 0 K k q i t 0 s 0 6 q x 6 X r C 1 i N 7 / A d g W w h U = < / l a t e x i t > z7 < l a t e x i t s h a 1 _ b a s e 6 4 = " r I V V R a G g d P S N Q 0 h 2 Z C 3 M K 1 v t 9 p 0 = " > A A A C d X i c b Z F L b 9 N A E M c 3 5 l X C o y 0 c E Z J F D O I U 7 K q 0 H K t y 4 V g k 0 l Z 0 o 2 g 9 n j S r 7 M P a H b d x L X 8 G r u W j 8 U m 4 s n l I k J S R V v p r 5 j e j 2 f / k p Z K e 0 v R X J 7 p 3 / 8 H D R 1 u P u 0 + e P n u + v b P 7 4 t T b y g E O w C r r z n P h U U m D A 5 K k 8 L x 0 K H S u 8 C y f f p 7 X z 6 7 Q e W n N N 6 p L H G p x a e R Y g q C

r 9 s F l s 2 8 Z v Q 6 a I x 9 a F Z y h e Z P / t a I T 2 v t Z 5 I L W g iV + r z Z Z D N v k 5 + F 8 + 1 + u s J y 1 c 7 Y q N H W n 8 a d h I U 1 a E B p Y r j i s V k 4 3 n p s W F d A i k 6 i A E O B l + G c N E O A E U r O 1 y g 9 d g t R a m a D i A d N A 2 f I r O p P 2 P O O N X E G x C 1 / B J b m d N w n 2 Y U J K n W i G f w 0 n b / q X b b r h G t u n 9 X X G 6 1 8 8 O + v t f 9 3 p H x 6 u 7 b L F X 7 A 1 7 z z J 2 y I 7 Y F 3 b C B g y Y Z D / Y L f v Z + R2 9 j p L o 3 R K N O q u e l 2 w t o g 9 / A N H s w h I = < / l a t e x i t > z 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " a 7 9 G M m I e o r D z n p v 3 o 6

8 D i d t + 5 d u u + E a 2 a b 3 d 8 X Z f j 8 7 7 H / 4 s t 8 7 P l n d Z Z u 9 Z K / Z O 5 a x I 3 b M P r N T N m D A J L t h t + x n 5 3 f 0 K k q i t 0 s 0 6 q x 6 X r C 1 i N 7 / A d g W w h U = < / l a t e x i t > z7 < l a t e x i t s h a 1 _ b a s e 6 4 = " I z J y + 6 w + w U C Q 0 V T l l h Z 5 s q 2 u M N E = " > A A A C e 3 i c b Z H b a h R B E I Z 7 x 1 N c D 0 n 0 0 p v B X U F E l p l g o p d B b 7 y M 4 C b B 7 W X p q a n N N t u H o b s m 2 a G Z x / B W n 8 u H E e w 9 g O 7 G g o a f q q + K 6 r + K S k l P W f a r k 9 y 5 e + / + g 7 2 H 3 U e P n z z d P z h 8 d u 5 t 7

x h a M o j d D o x 2 G 1 c 5 u + i p k y n V o X n 6 F 0 l f 2 3 I w j t f a O L S G p B M 7 9 V W 6 y H 7 P J L 8 L 9 8 o b d Z T 1 q 4 x p U 7 O 9 L 0 w z h I U 9 W E B t Y r T m u V k k 2 X 1 q W l d A i k m i g E O B l / m c J M O A E U D e 5 y g z d g t R a m D B x A O m g D n 6 M z 2 e A Y F / w a o k 3 o A p 8 V d h H 6 3 M c J F X l q F P I l 3 G / b v 3 T b j d f I d 7 2 / L c 6 P B v n J 4 N 2 X o 9 7 p x 8 1 d 9 t g L 9 p K 9 Z j l 7 z 0 7 Z Z 3 b G h g y Y Z d / Z D / a z 8 z v p J W + S t 2 s 0 6 W x 6 n r O t S I 7 / A O y a x N I = < / l a t e x i t > x1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 M C T D g G L K Y U Q q B N p j V 3 m Z S k Z L G 8 = " > A A A C e 3 i c b Z H b i h N B E I Y 7 4 2 m N h 9 3 V S 2 8 G E 0 F E w s x i 1 M t F b 7 x c w e w u p k P o q a l s m v R h 6 K 5 Z M z T z G N 7 q c / k w g p 0 D a L I W N P x U f V V U / 1 V U S n r K s l + d 5 N b t O 3 f v H d z v P n j 4 6 P H h 0 f G T c 2 9 r B z g C q 6 y 7

r X 2 X 8 7 g t D e N 7 q I p B Y 0 9 z u 1 5 W b I P r 8 C / 8 s X e p f 1 p I V r X L m 3 I 8 3 e T 4 I 0 V U 1 o Y L

/ b b 9 S 7 f d e I 1 8 3 / u b 4 v x k k L 8 d v P l 8 0 j v 9 s L 3 L A X v G n r O X L G f v 2 C n 7 x M 7 Y i A G z 7 D v 7 w X 5 2 f i e 9 5 F X y e o M m n W 3 P U 7 Y T y f A P 9 N L E 1 g = = < / l a t e x i t > x5 Fitting Imputation < l a t e x i t s h a 1 _ b a s e 6 4 = " M 8 l s V u i Y L j 9 z g o R 1 z L f O w C N a k l Y = " > A A A C f n i c b Z H b b h M x E I a d L Y c S T j 1 c c r M i A S F U 0 t 0 K a C 8 r u O G y S K S t i J f K O z t p r N j e l T 1 b s r L 2 P b i F t + J t c A 4 S J G U k S 7 9 m v h m N / 8 k r J R 0 l y e 9 O t H X n 7 r 3 7 2 w + 6 D x 8 9f v J 0 Z 3 f v 3 J W 1 B R x C q U p 7 m Q u H S h o c k i S F l 5 V F o X O F F / n 0 4 7 x + c Y P W y d J 8 o a b C T I t r I 8 c S B I X U t z 7 P t R / x g 9 f 8 I G v 7 V z u 9 Z J A s I r 4 t 0 p X o s V W c X e 1 2 v v K i h F q j I V D C u V G a V J R 5 Y U m C w r b L a 4 e V g K m 4 x l G Q R m h 0 m V + s 3 c Y v Q q a I x 6 U N z 1 C 8 y P 7 b 4 Y V 2 r t F 5 I L W g i V u r z Z Z D N v k 5 + F 8 + 1 + u s I y 1 s Y 4 u N H W l 8 k n l p q p r Q w H L F c a 1 i K u O 5 e 3 E h L Q K p J g g B V o Z f x j A R V g A F j 7 v c4 H c o t R a m 8 B x A W m g 9 n 6 I 1 y e A d z v g N B J v Q e j 7 J y 5 n v c x c m V O S o U c j n c L 9 t / 9 J t N 1 w j 3 f T + t j g / G q T v B 2 8 / H / V O P 6 z u s s 2 e s e f s F U v Z M T t l n 9 g Z G z J g l v 1 g P 9 m v i E U v o z f R 4 R K N O q u e f b Y W 0 c k f T r n E d Q = = < / l a t e x i t > [ ⇤ ] < l a t e x i t s h a 1 _ b a s e 6 4 = " M 8 l s V u i Y L j 9 z g o R 1 z L f O w C N a k l Y = " > A A A C f n i c b Z H b b h M x E I a d L Y c S T j 1 c c r M i A S F U 0 t 0 K a C 8 r u O G y S K S t i J f K O z t p r N j e l T 1 b s r L 2 P b i F t + J t c A 4 S J G U k S 7 9 m v h m N / 8 k r J R 0 l y e 9 O t H X n 7 r 3 7 2 w + 6 D x 8 9 f v J 0 Z 3 f v 3 J W 1 B R x C q U p 7 m Q u H S h o c k i S F l 5 V F o X O F F / n 0 4 7 x + c Y P W y d J 8 o a b C T I t r I 8 c S B I X U t z 7 P t R / x g 9 f 8 I

4 H c o t R a m 8 B x A W m g 9 n 6 I 1 y e A d z v g N B J v Q e j 7 J y 5 n v c x c m V O S o U c j n c L 9 t / 9 J t N 1 w j 3 f T + t j g / G q T v B 2 8 / H / V O P 6 z u s s 2 e s e f s F U v Z M T t l n 9 g Z G z J g l v 1 g P 9 mv i E U v o z f R 4 R K N O q u e f b Y W 0 c k f T r n E d Q = = < / l a t e x i t > [ ⇤ ] < l a t e x i t s h a 1 _ b a s e 6 4 = " M 8 l s V u i Y L j 9 z g o R 1 z L f O w C N a k l Y = " > A A A C f n i c b Z H b b h M x E I a d L Y c S T j 1 c c r M i A S F U 0 t 0 K a C 8 r u O G y S K S t i J f K O z t p r N j e l T 1 b s r L 2 P b i F t + J t c A 4 S J G U k S 7 9 m v h m N / 8 k r J R0 l y e 9 O t H X n 7 r 3 7 2 w + 6 D x 8 9f v J 0 Z 3 f v 3 J W 1 B R x C q U p 7 m Q u H S h o c k i S F l 5 V F o X O F F / n 0 4 7 x + c Y P W y d J 8 o a b C T I t r I 8 c S B I X U t z 7 P t R / x g 9 f 8 I G v 7 V z u 9 Z J A s I r 4 t 0 p X o s V W c X e 1 2 v v K i h F q j I V D C u V G a V J R 5 Y U m C w r b L a 4 e V g K m 4 x l G Q R m h 0 m V + s 3 c Y v Q q a I x 6 U N z 1 C 8 y P 7 b 4 Y V 2 r t F 5 I L W g i V u r z Z Z D N v k 5 + F 8 + 1 + u s I y 1 s Y 4 u N H W l 8 k n l p q p r Q w H L F c a 1 i K u O 5 e 3 E h L Q K p J g g B V o Z f x j A R V g A F j 7 v c 4 H c o t Ra m 8 B x A W m g 9 n 6 I 1 y e A d z v g N B J v Q e j 7 J y 5 n v c x c m V O S o U c j n c L 9 t / 9 J t N 1 w j 3 f T + t j g / G q T v B 2 8 / H / V O P 6 z u s s 2 e s e f s F U v Z M T t l n 9 g Z G z J g l v 1 g P 9 mv i E U v o z f R 4 R K N O q u e f b Y W 0 c k f T r n E d Q = = < / l a t e x i t > [ ⇤ ]< l a t e x i t s h a 1 _ b a s e 6 4 = " M 8 l s V u i Y L j 9 z g o R 1 z L f O w C N a k l Y = " > A A A C f n i c b Z H b b h M x E I a d L Y c S T j 1 c c r M i A S F U 0 t 0 K a C 8 r u O G y S K S t i J f K O z t p r N j e l T 1 b s r L 2 P b i F t + J t c A 4 S J G U k S 7 9 m v h m N / 8 k r J R 0 l y e 9 O t H X n 7 r 3 7 2 w + 6 D x 8 9 f v J 0 Z 3 f v 3 J W 1 B R x C q U p 7 m Q u H S h o c k i S F l 5 V F o X O F F / n 0 4 7 x + c Y P W y d J 8 o a b C T I t r I 8 c S B I X U t z 7 P t R / x g 9 f 8 I

4 H c o t R a m 8 B x A W m g 9 n 6 I 1 y e A d z v g N B J v Q e j 7 J y 5 n v c x c m V O S o U c j n c L 9 t / 9 J t N 1 w j 3 f T + t j g / G q T v B 2 8 / H / V O P 6 z u s s 2 e s e f s F U v Z M T t l n 9 g Z G z J g l v 1 g P 9 mv i E U v o z f R 4 R K N O q u e f b Y W 0 c k f T r n E d Q = = < / l a t e x i t > [ ⇤ ] < l a t e x i t s h a 1 _ b a s e 6 4 = " M 8 l s V u i Y L j 9 z g o R 1 z L f O w C N a k l Y = " > A A A C f n i c b Z H b b h M x E I a d L Y c S T j 1 c c r M i A S F U 0 t 0 K a C 8 r u O G y S K S t i J f K O z t p r N j e l T 1 b s r L 2 P b i F t + J t c A 4 S J G U k S 7 9 m v h m N / 8 k r J R 0 l y e 9 O t H X n 7 r 3 7 2 w + 6 D x 8 9 f v J 0 Z 3 f v 3 J W 1 B R x C q U p 7 m Q u H S h o c k i S F l 5 V F o X O F F /n 0 4 7 x + c Y P W y d J 8 o a b C T I t r I 8 c S B I X U t z 7 P t R / x g 9 f 8 I

Figure 1: Overall framework of REMASKER. During the fitting stage, for each input, in addition to its missing values, another subset of values (re-masked values) is randomly selected and masked out. The encoder is applied to the remaining values to generate its embedding, which is padded with mask tokens and processed by the decoder to re-construct the re-masked values. During the imputation stage, the optimized model is applied to predict the missing values.

Figure 3: Sensitivity analysis of REMASKER on the letter dataset under the MAR setting. The results are shown in terms of RMSE, WD, and AUROC, with the scores measured with respect to (a) the dataset size, (b) the number of features, and (c) the missingness ratio. The default setting is as follows: dataset size = 20,000, number of features = 16, and missingness ratio = 0.3.

Figure 4: (a) REMASKER performance with respect to the maximum number of training epochs; (b) Convergence of REMASKER's reconstruction loss. The experiments are performed on the letter dataset under MAR with 0.3 missingness ratio.

Figure 5: CKA similarity between the representations of complete and incomplete inputs (with the number of missing values controlled by the missingness ratio). The tested model is trained on letter under the MAR setting with 0.3 missingness ratio.

Figure 6a: Overall performance of REMASKER and 8 baselines on 12 benchmark datasets under MAR scenario with 0.1 and 0.3 missingness ratio. The results are shown as the mean and standard deviation of RMSE, WD, and AUROC scores (AUROC is only applicable to datasets with classification tasks).

Figure 6b: Overall performance of REMASKER and 8 baselines on 12 benchmark datasets under MAR scenario with 0.5 and 0.7 missingness ratio. The results are shown as the mean and standard deviation of RMSE, WD, and AUROC scores (AUROC is only applicable to datasets with classification tasks). B.4 TRAINING REGIME Figure 10 shows the imputation performance of REMASKER on the california dataset when the training length varies from 100 to 1,600 epochs. Figure 11 plots the convergence of reconstruction loss in REMASKER, showing a trend similar to Figure 4(b).

Figure 7a: Overall performance of REMASKER and 8 baselines on 12 benchmark datasets under MCAR scenario with 0.1 and 0.3 missingness ratio. The results are shown as the mean and standard deviation of RMSE, WD, and AUROC scores (AUROC is only applicable to datasets with classification tasks).

Figure 8a: Overall performance of REMASKER and 8 baselines on 12 benchmark datasets under MNAR with 0.1 and 0.3 missingness ratio. The results are shown as the mean and standard deviation of RMSE, WD, and AUROC scores (AUROC is only applicable to datasets with classification tasks).

Figure 9a: Sensitivity analysis of REMASKER on the california dataset under MAR and MNAR scenarios. The results are shown in terms of RMSE and WD, with the scores measured with respect to (a) the dataset size, (b) the number of features, and (c) the missingness ratio. The default setting is as follows: dataset size = 20,000, number of features = 9, and missingness ratio = 0.3.

Figure 10: REMASKER performance with respect to the number of training epochs on the california dataset under MAR with 0.3 missingness ratio.

Figure 11: Convergence of REMASKER's fitting on california under MAR with 0.3 missingness ratio.

} n i=1 : incomplete dataset; remask: re-masking function; f θ , d ϑ : encoder and decoder; max_epoch: training epochs; ℓ: reconstruction loss

Overall performance of REMASKER and baseline imputers on 12 benchmark datasets under MAR with 0.3 missingness ratio. The results are shown as the mean and standard deviation of RMSE, WD, and AUROC scores (AUROC is only applicable to datasets with classification tasks). Note that REMASKER outperforms all the baseline imputers under at least one metric across all the datasets.

Ablation study of REMASKER on the letter dataset. The default setting is as follows: encoder depth = 8, decoder depth = 6, embedding width = 64, masking ratio = 50%, and training epochs = 600.

Performance of REMASKER with different backbone models.

(a) REMASKER performance with respect to masking ratio; (b) REMASKER as the base imputer within HyperImpute.The results are evaluated on letter and california under MAR with 0.3 missingness ratio.

Characteristics of the datasets used in the experiments.

Ablation study of REMASKER on the california dataset. The default setting is as follows: encoder depth = 6, decoder depth = 4, embedding width = 32, masking ratio = 50%, and training epochs = 600.

availability

https://anonymous.4open.science/

annex

(c) MCAR-0.5 (d) MCAR-0.7 Figure 7b : Overall performance of REMASKER and 8 baselines on 12 benchmark datasets under MCAR scenario with 0.5 and 0.7 missingness ratio. The results are shown as the mean and standard deviation of RMSE, WD, and AUROC scores (AUROC is only applicable to datasets with classification tasks).

