CONTRASTIVE LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION OF TIME SERIES

Abstract

Unsupervised domain adaptation (UDA) aims at learning a machine learning model using a labeled source domain that performs well on a similar yet different, unlabeled target domain. UDA is important in many applications such as medicine, where it is used to adapt risk scores across different patient cohorts. In this paper, we develop a novel framework for UDA of time series data, called CLUDA. Specifically, we propose a contrastive learning framework to learn contextual representations in multivariate time series, so that these preserve label information for the prediction task. In our framework, we further capture the variation in the contextual representations between source and target domain via a custom nearest-neighbor contrastive learning. To the best of our knowledge, ours is the first framework to learn domain-invariant, contextual representation for UDA of time series data. We evaluate our framework using a wide range of time series datasets to demonstrate its effectiveness and show that it achieves state-of-the-art performance for time series UDA.

1. INTRODUCTION

Many real-world applications of machine learning are characterized by differences between the domains at training and deployment (Hendrycks & Dietterich, 2019; Koh et al., 2021) . Therefore, effective methods are needed that learn domain-invariant representations across domains. For example, it is well known that medical settings suffer from substantial domain shifts due to differences in patient cohorts, medical routines, reporting practices, etc. (Futoma et al., 2020; Zech et al., 2018) . Hence, a machine learning model trained for one patient cohort may not generalize to other patient cohorts. This highlights the need for effective domain adaptation of time series. Unsupervised domain adaptation (UDA) aims to learn a machine learning model using a labeled source domain that performs well on a similar yet different, unlabeled target domain (Ganin et al., 2016; Long et al., 2018) . So far, many methods for UDA have been proposed for computer vision (Chen et al., 2020a; Ganin et al., 2016; Huang et al., 2021; Kang et al., 2019; Long et al., 2018; Pei et al., 2018; Shu et al., 2018; Singh, 2021; Sun & Saenko, 2016; Tang et al., 2021; Tzeng et al., 2014; 2017; Xu et al., 2020; Zhu et al., 2020) . These works can -in principle -be applied to time series (with some adjustment of their feature extractor); however, they are not explicitly designed to fully leverage time series properties. In contrast, comparatively few works have focused on UDA of time series. Here, previous works utilize a tailored feature extractor to capture temporal dynamics of multivariate time series, typically through recurrent neural networks (RNNs) (Purushotham et al., 2017) , long short-term memory (LSTM) networks (Cai et al., 2021) , and convolutional neural networks (Liu & Xue, 2021; Wilson et al., 2020; 2021) . Some of these works minimize the domain discrepancy of learned features via adversarial-based methods (Purushotham et al., 2017; Wilson et al., 2020; 2021; Jin et al., 2022) or restrictions through metric-based methods (Cai et al., 2021; Liu & Xue, 2021) . Another research stream has developed time series methods for transfer learning from the source domain to the target domain (Eldele et al., 2021; Franceschi et al., 2019; Kiyasseh et al., 2021; Tonekaboni et al., 2021; Yang & Hong, 2022; Yèche et al., 2021; Yue et al., 2022) . These methods pre-train a neural network model via contrastive learning to capture the contextual representation of time series from unlabeled source domain. However, these methods operate on a labeled target domain, which is different from UDA. To the best of our knowledge, there is no method for UDA of time series that captures and aligns the contextual representation across source and target domains. In this paper, we propose a novel framework for unsupervised domain adaptation of time series data based on contrastive learning, called CLUDA. Different from existing works, our CLUDA framework aims at capturing the contextual representation in multivariate time series as a form of high-level features. To accomplish this, we incorporate the following components: (1) We minimize the domain discrepancy between source and target domains through adversarial training. (2) We capture the contextual representation by generating positive pairs via a set of semantic-preserving augmentations and then learning their embeddings. For this, we make use of contrastive learning (CL). (3) We further align the contextual representation across source and target domains via a custom nearest-neighborhood contrastive learning. We evaluate our method using a wide range of time series datasets. (1) We conduct extensive experiments using established benchmark datasets WISDM (Kwapisz et al., 2011) , HAR (Anguita et al., 2013), and HHAR (Stisen et al., 2015) . Thereby, we show our CLUDA leads to increasing accuracy on target domains by an important margin. (2) We further conduct experiments on two largescale, real-world medical datasets, namely MIMIC-IV (Johnson et al., 2020) and AmsterdamUMCdb (Thoral et al., 2021) . We demonstrate the effectiveness of our framework for our medical setting and confirm its superior performance over state-of-the-art baselines. In fact, medical settings are known to suffer from substantial domain shifts across health institutions (Futoma et al., 2020; Nestor et al., 2019; Zech et al., 2018) . This highlights the relevance and practical need for adapting machine learning across domains from training and deployment. Contributions:foot_0 1. We propose a novel, contrastive learning framework (CLUDA) for unsupervised domain adaptation of time series. To the best of our knowledge, ours is the first UDA framework that learns a contextual representation of time series to preserve information on labels. 2. We capture domain-invariant, contextual representations in CLUDA through a custom approach combining nearest-neighborhood contrastive learning and adversarial learning to align them across domains. 3. We demonstrate that our CLUDA achieves state-of-the-art performance. Furthermore, we show the practical value of our framework using large-scale, real-world medical data from intensive care units.

2. RELATED WORK

Contrastive learning: Contrastive learning aims to learn representations with self-supervision, so that similar samples are embedded close to each other (positive pair) while pushing dissimilar samples away (negative pairs). Such representations have been shown to capture the semantic information of the samples by maximizing the lower bound of the mutual information between two augmented views (Bachman et al., 2019; Tian et al., 2020a; b) . Several methods for contrastive learning have been developed so far (Oord et al., 2018; Chen et al., 2020b; Dwibedi et al., 2021; He et al., 2020) , and several of which are tailored to unsupervised representation learning of time series (Franceschi et al., 2019; Yèche et al., 2021; Yue et al., 2022; Tonekaboni et al., 2021; Kiyasseh et al., 2021; Eldele et al., 2021; Yang & Hong, 2022; Zhang et al., 2022) . A detailed review is in Appendix A. Unsupervised domain adaptation: Unsupervised domain adaptation leverages labeled source domain to predict the labels of a different, unlabeled target domain (Ganin et al., 2016) . To achieve this, UDA methods typically aim to minimize the domain discrepancy and thereby decrease the lower bound of the target error (Ben-David et al., 2010) . To minimize the domain discrepancy, existing UDA methods can be loosely grouped into three categories: 



Codes are available at https://github.com/oezyurty/CLUDA .



(1) Adversarial-based methods reduce the domain discrepancy via domain discriminator networks, which enforce the feature extractor to learn domain-invariant feature representations. Examples are DANN (Ganin et al., 2016), CDAN (Long et al., 2018), ADDA (Tzeng et al., 2017), MADA (Pei et al., 2018), DIRT-T (Shu et al., 2018), and DM-ADA (Xu et al., 2020). (2) Contrastive methods reduce the domain discrepancy through a minimization of a contrastive loss which aims to bring source and target embeddings of the same class

