LEARNING HYPER LABEL MODEL FOR PROGRAMMATIC WEAK SUPERVISION

Abstract

To reduce the human annotation efforts, the programmatic weak supervision (PWS) paradigm abstracts weak supervision sources as labeling functions (LFs) and involves a label model to aggregate the output of multiple LFs to produce training labels. Most existing label models require a parameter learning step for each dataset. In this work, we present a hyper label model that (once learned) infers the ground-truth labels for each dataset in a single forward pass without dataset-specific parameter learning. The hyper label model approximates an optimal analytical (yet computationally intractable) solution of the ground-truth labels. We train the model on synthetic data generated in the way that ensures the model approximates the analytical optimal solution, and build the model upon Graph Neural Network (GNN) to ensure the model prediction being invariant (or equivariant) to the permutation of LFs (or data points). On 14 real-world datasets, our hyper label model outperforms the best existing methods in both accuracy (by 1.4 points on average) and efficiency (by six times on average). Our code is available at https://github.com/wurenzhi/hyper label model 

1. INTRODUCTION

The lack of labeled training data is a major challenge impeding the practical application of machine learning (especially deep learning) techniques. Therefore, practitioners have been increasingly turned to weak supervision in which large amounts of cheaply generated noisy labels are used. There are many forms of weak supervision sources, e.g. external knowledge bases (Mintz et al., 2009) , existing pre-trained models (Das et al., 2020; Wu et al., 2022b) , and heuristics/rules (Shin et al., 2015) . To unify different sources, the programmatic weak supervision (PWS) paradigm (Ratner et al., 2016; 2017; Zhang et al., 2022) was proposed. In PWS, the user expresses each available weak supervision signal from different sources with a labeling function (LF), a small program that takes in a data point and outputs a noisy label. After that, each LF is applied to unlabeled data of arbitrary size to obtain a noisy label vector; then, a label aggregation model (also referred as label model in literature) is used to aggregate all noisy label vectors to infer the unknown ground-truth labels. The inferred labels can then be used to train any downstream end models. The PWS paradigm has been successful in various tasks (Wu et al., 2018; Fries et al., 2019; Lison et al., 2020; Wu et al., 2021; 2020; Li et al., 2021) and industry scenarios (Mathew et al., 2021; Bach et al., 2019; Dunnmon et al., 2020) . The core challenge in PWS is how to aggregate all noisy label vectors to infer the ground-truth labels. Let label matrix X denote the noisy labels where each column X[:, j] denotes the noisy label vector from the j th LF and each row X[i, :] denotes the weak labels of the i th data point; Let y denote the ground-truth label vector. Most existing label models assume an underlying distribution p(y[i]|X[i, :]; θ) (Zhang et al., 2022) where y[i] is the label for the data point and θ is the parameter of the distribution. The parameter θ is first learned on the weak labels X = (X[1, :], X[2, :], . . . ) in an unsupervised and typically iterative way, and then inference is made using p(y[i]|X[i, :]; θ). In this approach, the parameter θ is dataset-specific and has to be learned for every different X (dataset). In contrast to existing solutions, we propose a hyper label model with the goal of reducing assumptions and parameter learning process. Specifically, we aim to develop a hyper model that enjoys

