ROBUST LEARNING OF FIXED-STRUCTURE BAYESIAN NETWORKS IN NEARLY-LINEAR TIME

Abstract

We study the problem of learning Bayesian networks where an -fraction of the samples are adversarially corrupted. We focus on the fully-observable case where the underlying graph structure is known. In this work, we present the first nearlylinear time algorithm for this problem with a dimension-independent error guarantee. Previous robust algorithms with comparable error guarantees are slower by at least a factor of (d/ ), where d is the number of variables in the Bayesian network and is the fraction of corrupted samples. Our algorithm and analysis are considerably simpler than those in previous work. We achieve this by establishing a direct connection between robust learning of Bayesian networks and robust mean estimation. As a subroutine in our algorithm, we develop a robust mean estimation algorithm whose runtime is nearly-linear in the number of nonzeros in the input samples, which may be of independent interest.

1. INTRODUCTION

Probabilistic graphical models (Koller & Friedman, 2009) offer an elegant and succinct way to represent structured high-dimensional distributions. The problem of inference and learning in probabilistic graphical models is an important problem that arises in many disciplines (see Wainwright & Jordan (2008) and the references therein), which has been studied extensively during the past decades (see, e.g., Chow & Liu (1968) ; Dasgupta (1997) Bayesian networks (Jensen & Nielsen, 2007) are an important family of probabilistic graphical models that represent conditional dependence by a directed graph (see Section 2 for a formal definition). In this paper, we study the problem of learning Bayesian networks where an -fraction of the samples are adversarially corrupted. We focus on the simplest setting: all variables are binary and observable, and the structure of the Bayesian network is given to the algorithm. Formally, we work with the following corruption model: Definition 1.1 ( -Corrupted Set of Samples). Given 0 < < 1/2 and a distribution family P on R d , the algorithm first specifies the number of samples N , and N samples X 1 , X 2 , . . . , X N are drawn from some unknown P ∈ P. The adversary inspects the samples, the ground-truth distribution P , and the algorithm, and then replaces N samples with arbitrary points. The set of N points is given to the algorithm as input. We say that a set of samples is -corrupted if it is generated by this process. This is a strong corruption model which generalizes many existing models. In particular, it is stronger than Huber's contamination model (Huber, 1964) , because we allow the adversary to add bad samples and remove good samples, and he can do so adaptively. Our goal is to design robust algorithms for learning Bayesian networks with dimension-independent error. More specifically, given as input an -corrupted set of samples drawn from some groundtruth Bayesian network P and the graph structure of P , we want the algorithm to output a Bayesian network Q, such that the total variation distance between P and Q is upper bounded by a function that depends only on (the fraction of corruption) but not d (the number of variables in P ).



; Abbeel et al. (2006); Wainwright et al. (2006); Anandkumar et al. (2012); Santhanam & Wainwright (2012); Loh & Wainwright (2012); Bresler et al. (2013; 2014); Bresler (2015)).

