MIDAS: MULTI-INTEGRATED DOMAIN ADAPTIVE SU-PERVISION FOR FAKE NEWS DETECTION Anonymous

Abstract

Covid-19 related misinformation and fake news, coined an 'infodemic', has dramatically increased over the past few years. This misinformation exhibits concept drift, where the distribution of fake news changes over time, reducing effectiveness of previously trained models for fake news detection. Given a set of fake news models trained on multiple domains, we propose an adaptive decision module to select the best-fit model for a new sample. We propose MIDAS, a multi-domain adaptative approach for fake news detection that ranks relevancy of existing models to new samples. MIDAS contains 2 components: a doman-invariant encoder, and an adaptive model selector. MIDAS integrates multiple pre-trained and fine-tuned models with their training data to create a domain-invariant representation. Then, MIDAS uses local Lipschitz smoothness of the invariant embedding space to estimate each model's relevance to a new sample. Higher ranked models provide predictions, and lower ranked models abstain. We evaluate MIDAS on generalization to drifted data with 9 fake news datasets, each obtained from different domains and modalities. MIDAS achieves new state-of-the-art performance on multi-domain adaptation for out-of-distribution fake news classification.

1. INTRODUCTION

The misinformation and fake news associated with the COVID-19 pandemic, called an 'infodemic' by WHO (Enders et al., 2020) , have grown dramatically, and evolved with the pandemic. Fake news has eroded institutional trust (Ognyanova et al., 2020) and have increasingly negative impacts outside social communities (Quinn et al., 2021) . The challenge is to filter active fake news campaigns while they are raging, just like today's online email spam filters, instead of offline, retrospective detection long after the campaigns have ended. We divide this challenge to detect fake news online into two parts: (1) the variety of data (both real and fake), and (2) the timeliness of data collection and processing (both real and fake). In this paper, we focus on the first (variety) part of challenge, with the timeliness (which depends on solutions to handle variety) in future work (Pu et al., 2020) . The infodemic, and fake news more generally, evolves with a growing variety of ephemeral topics and content, a phenomenon called real concept drift (Gama et al., 2014) . However, the excellent results on single-domain classification (Chen et al., 2021) , have generalization difficulties when applied to cross-domain experiments (Wahle et al., 2022; Suprem & Pu, 2022) . A benchmark study over 15 language models shows reduced cross-domain fake news detection accuracy (Wahle et al., 2022) . A generalization study in (Suprem & Pu, 2022) finds significant performance deterioration when models are used on unseen, non-overlapping datasets. Intuitively, it is entirely reasonable that state-of-the-art models trained on one dataset or time period will have reduced accuracy on future time periods. Real concept drift is introduced into fake news as content changes (Gama et al., 2014 ), camouflage (Shrestha & Spezzano, 2021 ), linguistic drift (Eisenstein et al., 2014) , and adversarial adaptation by fake news producers when faced with debunking efforts such as CDC on the pandemic (Weinzierl et al., 2021) . To catch up with concept drift, the classification models need to be expanded to cover a wide variety of data sets (Li et al., 2021; Suprem & Pu, 2022; Kaliyar et al., 2021) , or augmented with new knowledge on true novelty such as the appearance of the Omicron variant (Pu et al., 2020) . In this paper, we assume the availability of domain-specific authorative sources such as CDC and WHO that provide trusted up-to-date information on the pandemic. A key challenge of such multi-domain classifiers is a decision module to select the best-fit model amongst a set of existing models to classify new samples. This degree of knowledge is defined by the overlap between an unlabeled sample and existing models' training datasets (Suprem & Pu, 2022) . Intuitively, a best-fit model better captures a sample point's neighborhood in its own training data Urner & Ben-David (2013); Chen et al. (2022) . MiDAS. We propose MIDAS, a multi-domain adaptative approach for early fake news detection, with potential for online filtering. MIDAS integrates multiple pre-trained and fine-tuned models along with their training data to create a domain-invariant representation. On this representation, MIDAS uses a notion of local Lipschitz smoothness to estimate the overlap, and therefore relevancy, between a new sample and model training datasets. This overlap estimate is used to rank models on relevancy to the new sample. Then, MIDAS selects the highest ranked model to perform classification. We evaluate MIDAS on 9 fake news datasets obtained from different domains and modalities. We show new state-of-the-art performance on multi-domain adaptation for early fake news classification. Contributions. Our contributions are as follows: 1. MIDAS: a framework for adaptive model selection by using sample-to-data overlap to measure model relevancy 2. Experimental results of MIDAS on 9 fake news datasets with state-of-the-art results using unsupervised domain adaptation.

2.1. MULTI-DOMAIN ADAPTATION

Domain adaptation maps a target domain into a source domain. This allows a classifier learned from the source domain to predict the target domain samples (Farahani et al., 2021) . Some approaches focus on a domain invariant representation between source and target (Huang et al., 2021) . Then, a new classifier can be trained on this invariant representation for both source and target samples. Domain invariance is scalable to multiple source domains by fusing their latent representations with an adversarial encoder-discriminator framework (Li et al., 2021) . For multi-source domain adaptation (MDA), classifiers for each source have different weights: static weights using distance (Li et al., 2021) or per-sample weights on l2 norm (Suprem et al., 2020).

2.2. LABEL CONFIDENCE

Alongside domain adaptation, weak supervision (WS) is also common for propagating labels from source domains to a target domain (Ratner et al., 2017) . Both approaches estimate labels closest to the true label of the target domain sample. This works with the assumption that the source domains or labeling functions, respectively, are correlated to the true labels due to expertise and domain knowledge. In each case, whether MDA or WS, domains or labeling functions need to be weighted to ensure reliance on the best-fit source. 

3. PROBLEM SETUP AND STRATEGY

Let there be k source data domains, with labels {X i , Y i } k i=1 ∈ {D} k i=1 . Each of these source has an associated source model SM, with a total of k SMs: {f i } k i=1 , where we have access to the training data X i and weights w i . Each SM yields hidden embeddings through a feature extractor backbone, or foundation model (Bommasani et al., 2021) . Embeddings are projected to class probabilities with any type of classification layer/module.



Snorkel, from (Ratner et al., 2017), uses expert labeling functions and weighs them on conditional independence. Similarly, approaches in(Chen et al., 2020;  Fu et al., 2020)  use coverage of expert foundation models and weigh on distance to embedded sample. EEWS from(Rühling Cachay et al., 2021)  directly combines source data and labeling function in estimator parametrization to generate dynamic weights for each sample. MDA approaches weigh sources with weak supervision(Li et al., 2021), distance (Suprem & Pu, 2022), or as team-of-experts(Pu et al., 2020).

