DEEP ECOLOGICAL INFERENCE

Abstract

We introduce an efficient approximation to the loss function for the ecological inference problem, where individual labels are predicted from aggregates. This allows us to construct ecological versions of linear models, deep neural networks, and Bayesian neural networks. Using these models we infer probabilities of vote choice for candidates in the Maryland 2018 midterm elections for 2,322,277 voters in 2055 precincts. We show that increased network depth and joint learning of multiple races within an election improves the accuracy of ecological inference when compared to benchmark data from polling. Additionally we leverage data on the joint distribution of ballots (available from ballot images which are public for election administration purposes) to show that joint learning leads to significantly improved recovery of the covariance structure for multi-task ecological inference. Our approach also allows learning latent representations of voters, which we show outperform raw demographics for leave-one-out prediction.

1. INTRODUCTION

Ecological inference (EI), or learning labels from label proportions, is the problem of trying to make predictions about individual units from observations about aggregates. The canonical case is voting. We cannot observe individual people's votes, but people live in precincts, and we know for each precinct what the final vote count was. The problem is to try to estimate probabilities that a particular type of individual voted for a candidate. Since we can not observe individual labels, but only sums of pre-specified groups of labels, nonidentifiability is inherent to the ecological inference problem. The possibility of interaction effects between any relevant demographics and the aggregation groups themselves also means that Simpson's paradox type confounding is an ever present risk. The most basic approach to this problem involves assuming total heterogeneity at the precinct level, and simply assigning the final distribution of votes in a precinct to each person living in that precinct. However, typically people are sorted geographically along characteristics that are politically salient, and that variation can be leveraged to learn information about voting patterns based on those demographics. Classical ecological regressions use aggregate demographics, but here we have access to individual-level demographics via a commercial voter file with individual records, and therefore we construct our models at the individual level. There are a number of advantages to using individual demographics for ecological inference, but note that while individual-level features are observed, individual-level labels still can not be observed, and therefore the fundamental challenges of non-identifiability and aggregation paradoxes remain. Related Work Classical ecological inference typically assumes an underlying individual linear model and constructs estimators for those model coefficients using aggregated demographics and labels King (1997) . More recent work has used distribution regression for large-scale ecological inference incorporating Census microdata in nationwide elections in the US Flaxman et al. (2016) . Aggregated labels represent a substantial loss of information that could be used to constrain inferences, and all ecological methods rely on the analyst making assumptions which are not definitively empirically testable from those aggregates alone. Some research has been done on visual techniques for determining when some of these assumptions may have been violated Gelman et al. (2001) . Other work has sought to impose additional constraints on the ecological problem by incorporating information from multiple elections Park et al. (2014) , which also has the benefit of allowing for estimation of voter transitions, which themselves are of interest. Our models are built from individual records of commercial voter file data, which leads to some immediate differences between classical methods. First, we do not need to determine the composition of the electorate from an ecological model, since we already know who from the total population voted from administrative records and we only need to use an ecological model for vote choice (who votes is public). Second, we can directly specify a model for vote choice at the individual level and train this using a suitable ecological loss function, as opposed to specifying a model only on aggregates. This is convenient and allows the analyst a great deal of flexibility in modeling choices, and opens the possibility of using ecological inference for individual-level vote choice prediction as well as individual-level latent space modeling. However it does not automatically resolve the fundamental difficulties of ecological methods. We therefore introduce additional constraints on the ecological problem by jointly training on multiple races in a given election year in a manner analogous to existing methods, but adapted to our individual-level framework. Contributions We develop a loss function that allows us to approximate the poisson binomial and poisson multinomial loss that sits at the center of the individually-oriented approach to EI, but are too intractable to optimize directly. This approach allows us to extend EI with deep learning, providing three key benefits: learning non-linear relationships in data, jointly learning multiple aggregated outcomes and learning low-dimensional high-information representations of individual voters. We apply these methods to data from the Maryland 2018 midterm election, to estimate vote propensity in the elections for the Governor, US Senator, and Attorney General. We validate these estimates using three datasets: first, post-election survey data on individuals' vote choices; second, data on the joint distribution of candidate support for these three races; and third we validate the learned representations by predicting individual responses to surveys using data linked to the representation.

2.1. APPROXIMATING THE POISSON BINOMIAL AND MULTINOMIAL LOSS

We model the choice of candidate (including abstaining from voting on a particular race) as a realization from a categorical distribution, according to an individually defined vector of probabilities p (i) , so that each individuals "vote" is a one-hot encoded vector, where the indicator represents the candidate the selected. If individuals are modeled as independent realizations of non-identically distributed categorical variables then precincts are distributed as poisson multinomial variables, parameterized by a matrix P where each row corresponds to an individual's set of probabilities. The likelihood of the observed precinct-level counts is then: L(v 1 , v 2 , . . . , v C |P) = log   A∈F C (i,c(i))∈A P (i) c(i)   (1) Where F C is the set of all possible assignments (i, c(i)) of an individual i to a vote choice c(i), subject to the constraint that the total number of individuals assigned to each candidate c is v c , so that |F C | = ( C c=1 vc)! C c=1 vc! . Even for the simplest possible case of a single race with only two candidates, this loss is not straightforward to compute Hong (2013). In practice this loss is intractable to compute for the voting setting, where C > 3 (the United States has a two party vote system, and people need not fill vote in every race at the ballot box) and N > 20, 000. We propose to approximate this loss by assuming as

