GENERALIZED PRECISION MATRIX FOR SCALABLE ES-TIMATION OF NONPARAMETRIC MARKOV NETWORKS

Abstract

A Markov network characterizes the conditional independence structure, or Markov property, among a set of random variables. Existing work focuses on specific families of distributions (e.g., exponential families) and/or certain structures of graphs, and most of them can only handle variables of a single data type (continuous or discrete). In this work, we characterize the conditional independence structure in general distributions for all data types (i.e., continuous, discrete, and mixed-type) with a Generalized Precision Matrix (GPM). Besides, we also allow general functional relations among variables, thus giving rise to a Markov network structure learning algorithm in one of the most general settings. To deal with the computational challenge of the problem, especially for large graphs, we unify all cases under the same umbrella of a regularized score matching framework. We validate the theoretical results and demonstrate the scalability empirically in various settings.

1. INTRODUCTION

Markov networks (also known as Markov random fields) represent conditional dependencies among random variables. They provide clear semantics in a graphical manner to cope with uncertainty in probability theory, with a wide application in fields including physics (Cimini et al., 2019 ), chemistry (Dodani et al., 2016 ), biology (Jaimovich et al., 2006 ), and sociology (Carrington et al., 2005) . The undirected nature of edges also allows cyclic, overlapping, or hierarchical interactions (Shen et al., 2009) . To estimate the Markov network from observational data, existing work focuses on certain parametric families of distributions, a majority of which study the Gaussian case. By assuming that the variables are from a multivariate Gaussian distribution, the dependencies can be well represented by the support of the precision, or inverse covariance, matrix according to Hammersley-Clifford theorem (Besag, 1974; Grimmett, 1973) . Together with various statistical estimators (e.g., the graphical lasso (Friedman et al., 2008) and neighborhood selection (Meinshausen & Bühlmann, 2006) ), this connection between the precision matrix and graphical structure has been well exploited in the Gaussian case in the past decades (Yuan, 2010; Ravikumar et al., 2011) . However, methods for Gaussian graphical models fail to correctly capture dependencies among variables deviating from Gaussian or including nonlinearity (Raskutti et al., 2008; Ravikumar et al., 2011) . While non-Gaussianity is more common in real-world data generating process, few results are applicable to Markov network structure learning on non-Gaussian data. In the discrete setting, Ravikumar et al. (2010) showed that a binary Ising model can be recovered by neighborhood selection using ℓ 1 penalized logistic regression. Loh & Wainwright (2013) encoded extra structural relations in the proposed generalized covariance matrix to model the dependencies for Markov networks with certain structures (e.g., tree structures or graphs with only singleton separator sets) among variables from exponential families. Several approaches allowed estimation for non-Gaussian continuous variables while most of them assumed parametric assumptions such as the exponential families (Yang et al., 2015; Lin et al., 2016; Suggala et al., 2017) or Gaussian copulas (Liu et al., 2009; 2012; Harris & Drton, 2013) . These methods illustrate the possibility of reliable Markov network estimations in several non-Gaussian cases, but still, the models are restricted to specific parametric families of distributions and/or structures of conditional independencies.

