NOISE + 2NOISE: CO-TAUGHT DENOISING AUTOEN-CODERS FOR TIME-SERIES DATA

Abstract

We consider the task of learning to recover clean signals given only access to 1 noisy data. Recent work in computer vision has addressed this problem in the 2 context of images using denoising autoencoders (DAEs). However, to date DAEs 3 for learning from noisy data have not been explored in the context of time-series 4



data. DAEs for denoising images often rely on assumptions unlikely to hold in the 5 context of time-series, e.g., multiple noisy samples of the same example. Here, 6 we adapt DAEs to cleaning time-series data with noisy samples only. To recover 7 the clean target signal when only given access to noisy target data, we leverage a 8 noise-free auxiliary time-series signal that is related to the target signal. In addi-9 tion to leveraging the relationship between the target signal and auxiliary signal, 10 we iteratively filter and learn from clean samples using an approach based on co-11 teaching. Applied to the task of recovering carbohydrate values for blood glucose 12 management, our approach reduces noise (MSE) in patient-reported carbohydrates 13 from 72g 2 (95% CI: 54,93) to 18g 2 (13,25), outperforming the best baseline (MSE , 2020) . While these approaches may be considered in a time-series setting (and are treated as 33 baselines here), their applicability is limited as noise in time-series settings is rarely weak or known.

34

Our Contribution. In light of this gap, we adapt denoising autoencoders for time-series data. Our Under review as a conference paper at ICLR 2023 Han et al., 2018) . We use this approach to identify the cleaner samples; the most likely low-noise



33g 2 (27,43)). We demonstrate strong time-series denoising performance, ex-15 tending the applicability of DAEs to a previously under-explored setting. Denoising autoencoders (DAEs) (Vincent et al., 2008) have been used to accurately 18 denoise various signals, including medical images (Gondara, 2016), ECG signals (Xiong et al., 19 2016), and power system measurements (Lin et al., 2019). With respect to time-series data, DAEs 20 have been used for forecasting (Romeu et al., 2015), classification (Zheng et al., 2022) and impu-21 tation (Zhang & Yin, 2019), but generally require access to clean samples at training and do not 22 provide de-noised outputs. In many real-world settings, clean samples are unavailable at training. 23 Work in computer vision has addressed this problem through extensions that either require paired 24 samples (Lehtinen et al., 2018) or rely on patch-based analysis (Krull et al., 2018; Laine et al., 2019; 25 Xie et al., 2020; Batson & Royer, 2019). Similar approaches do not extend to time-series data, 26 where paired samples rarely exist and patch-based techniques do not apply. Beyond approaches that 27 rely on paired samples or patch-based analyses, researchers have recently proposed techniques that 28 utilize knowledge of the noise distribution to recover the clean signal. These approaches either use 29 the properties of the distribution to recover the clean signal after training on noisy data (Kim & Ye, 30 2021; Moran et al., 2019), or rely on the noise having low expectation and variance compared to the 31 signal, in which case a model trained on noisy data can approximate one trained on clean data (Xu 32 et al.

35approach, 'Noise + 2Noise', learns to map a noisy target signal to a clean signal given only noisy 36 samples and an auxiliary clean signal. Inspired by work in image denoising (Lehtinen et al., 2018; 37 Xu et al., 2020), we add additional noise to the noisy target signal during training and attempt to 38 recover the original noisy signal. Provided that the noise has low expectation and variance, a network 39 trained in this manner can learn to recover the true signal because the noise will minimally impact 40 the expected value of the output of the network (Xu et al., 2020). The auxiliary signal is input 41 along with the target signal into a denoising autoencoder, which allows our network to leverage the 42 relationship between the auxiliary and target signals. To address the fact that the signal to noise 43 ratio might not be weak, we adapt a co-teaching approach to train two DAEs (Jiang et al., 2018; 44 1

annex

samples are identified as the low-loss samples of the other model and used for backpropagation.

46

This co-teaching approach has never been applied to de-noising in a time-series or any other setting.

47

It has also never been utilized in a continuous output setting. By adapting this co-teaching approach 48 to DAEs, we provide a solution to denoising in this novel (time-series) setting.

49

Real-world Inspiration. Disparate levels of noise across variables are common in data streams.

50

Measurement reliability can vary across sensors, from essentially noiseless to highly corrupted.

51

Throughout this work we take inspiration from a real-world problem affecting millions in the US: et al., 2019; Wu et al., 2020) . Co-teaching (Han et al., 2018) , which builds off of mentor 85 net (Jiang et al., 2018) , is performed by training two networks in parallel. Each network is back-86 propgated using only samples within the current mini-batch for which the loss of the other network 87 is lowest. Intuitively, samples with incorrect labels are likely to have higher loss and therefore be 88 removed. Using two networks in parallel provides robustness to outliers and initially misclassified 89 samples, which single-network boosting-style approaches are sensitive to. These approaches have 90 been primarily explored in a supervised setting. In contrast, we consider an unsupervised setting in 91 which labels are unavailable and, instead, the input signals themselves are corrupted. To the best of 92 our knowledge, such a co-teaching approach has not been explored in the context of denoising. 

