BRIDGING THE GAP BETWEEN SEMI-SUPERVISED AND SUPERVISED CONTINUAL LEARNING VIA DATA PRO-GRAMMING

Abstract

Semi-supervised continual learning (SSCL) has shown its utility in learning cumulative knowledge with partially labeled data per task. However, the state-of-the-art has yet to explicitly address how to reduce the performance gap between using partially labeled data and fully labeled. In response, we propose a general-purpose SSCL framework, namely DP-SSCL, that uses data programming (DP) to pseudolabel the unlabeled data per task, and then cascades both ground-truth-labeled and pseudo-labeled data to update a downstream supervised continual learning model. The framework includes a feedback loop that brings mutual benefits: On one hand, DP-SSCL inherits guaranteed pseudo-labeling quality from DP techniques to improve continual learning, approaching the performance of using fully supervised data. On the other hand, knowledge transfer from previous tasks facilitates training of the DP pseudo-labeler, taking advantage of cumulative information via self-teaching. Experiments show that (1) DP-SSCL bridges the performance gap, approaching the final accuracy and catastrophic forgetting as using fully labeled data, (2) DP-SSCL outperforms existing SSCL approaches at low cost, by up to 25% higher final accuracy and lower catastrophic forgetting on standard benchmarks, while reducing memory overhead from 100 MB level to 1 MB level at the same time complexity, and (3) DP-SSCL is flexible, maintaining steady performance supporting plug-and-play extensions for a variety of supervised continual learning models.

1. INTRODUCTION

Lifelong machine learning, also known as continual learning (CL), is a machine learning paradigm that accumulates knowledge over sequential tasks (Ruvolo & Eaton, 2013a; Silver et al., 2013;  



Chen & Liu, 2016; Liu, 2017). It empowers machine learning at the application level such that an agent does not need to be trained from scratch with large amounts of data for every new task, as well as enabling the agent's self-improvement on previously-learned tasks by continuing to learn post-deployment. Nevertheless, researchers have identified that obtaining labeled training data is expensive(Olivier et al., 2006; Settles, 2009), which semi-supervised continual learning (SSCL) addresses(Baucum et al., 2017; Wang et al., 2021; Smith et al., 2021). As the name suggests, SSCL utilizes not only labeled data, but also leverages unlabeled task data to construct a cumulative knowledge base for learning agents, reducing labeling cost in applied machine learning.Despite all the research efforts on SSCL, the state-of-the-art of SSCL(Baucum et al., 2017; Wang et al., 2021; Smith et al., 2021) has yet to address an elephant in the room: closing the performance gap between supervised and semi-supervised CL. Ideally, learning from n L labeled data and n U unlabeled data per task should provide the same lifelong performance as if all the n L + n U data are labeled, but state-of-the-art SSCL frameworks have not approached this goal, and rarely consider computational cost required to do so. Moreover, multiple supervised CL tools have matured Lee et al. (2019); Yoon et al. (2018); Bulat et al. (2020) and would likely benefit by extending them to the semi-supervised setting, but current SSCL approaches are architecture-specific and such extension is non-trivial. 1

