LEARNING A MAX-MARGIN CLASSIFIER FOR CROSS-DOMAIN SENTIMENT ANALYSIS Anonymous

Abstract

Sentiment analysis is a costly yet necessary task for enterprises to study the opinions of their customers to improve their products and services and to determine optimal marketing strategies. Due to the existence of a wide range of domains across different products and services, cross-domain sentiment analysis methods have received significant attention in recent years. These methods mitigate the domain gap between different applications by training cross-domain generalizable classifiers which help to relax the need for individual data annotation for each domain. Most existing methods focus on learning domain-agnostic representations that are invariant with respect to both the source and the target domains. As a result, a classifier that is trained using annotated data in a source domain, would generalize well in a related target domain. In this work, we introduce a new domain adaptation method which induces large margins between different classes in an embedding space based on the notion of prototypical distribution. This embedding space is trained to be domain-agnostic by matching the data distributions across the domains. Large margins in the source domain help to reduce the effect of "domain shift" on the performance of a trained classifier in the target domain. Theoretical and empirical analysis are provided to demonstrate that the method is effective.

1. INTRODUCTION

The main goal in sentiment classification is to predict the polarity of users automatically after collecting their feedback, e.g., Amazon customer reviews. Popularity of online shopping and reviews, fueled further by the recent pandemic, provides a valuable resource for businesses to study the behavior and preferences of consumers and to align their products and services with the market demand. A major challenge for automatic sentiment analysis is that polarity is expressed using completely dissimilar terms and phrases in different domains. For example, while terms such as "fascinating" and "boring" are used to describe books, terms such as "tasty" and "stale" are used to describe food products. As a result of this discrepancy, a model that is trained for a particular domain may not generalize well in other different domains, referred as the problem of "domain gap" (Wei et al., 2018) . Since generating annotated training data for all domains is expensive and time-consuming, cross-domain sentiment analysis has gained significant attention recently (Saito et al., 2018; Li et al., 2017; Peng et al., 2018; He et al., 2018; Li et al., 2018; Barnes et al., 2018; Sarma et al., 2019; Li et al., 2019; Guo et al., 2020; Xi et al., 2020; Dai et al., 2020; Lin et al., 2020) . The goal in cross-domain sentiment classification is to relax the need for data annotation via transferring knowledge from another domain with annotated data to domains with unannotated data. The above problem has been studied more broadly in the "domain adaptation" literature. A common approach for domain adaptation is to map data points from two domains into a shared embedding space to align the data distributions (Redko & Sebban, 2017) . Since the embedding space would become domain-agnostic, i.e., a classifier that is trained using the source domain annotated data, will generalize in the target domain. In the sentiment analysis problem, this means that polarity of natural language can be expressed independent of the domain in the embedding space. We can model this embedding space as the output of a shared deep encoder which is trained to align the distributions of both domains at its output. This training procedure have been implemented using both adversarial learning (Pei et al., 2018; Long et al., 2018; Li et al., 2019; Dai et al., 2020) , which aligns distributions indirectly, or by loss functions that are designed to directly align the two distributions (Peng et al., 2018; Barnes et al., 2018; Kang et al., 2019; Guo et al., 2020; Xi et al., 2020; Lin et al., 2020) . Contributions: our main contribution is to develop a new cross-domain sentiment analysis algorithm for model adaptation by introducing large margins between classes in the source domain. Our idea is based on learning a prototypical distribution for the source domain in a cross-domain embedding space which is trained to be domain-agnostic. We model this distribution as a Gaussian mixture modal (GMM). We estimate the parameters of the prototypical distribution using a subset of source samples for which the classifier is confident about its predictions. As a result, larger margins between classes are introduced in the prototypical distribution which help reducing domain gap. We then use this prototypical distribution to align the source and the target distributions via minimizing the Sliced Wasserstein Distance (SWD) (Lee et al., 2019) . We draw confident random samples from this distribution and enforce the distribution of the target in the embedding matches this prototypical distribution in addition to the source distribution. We provide a theoretical proof to demonstrate that our method minimizes an upperbound for the target domain expected error. Experimental results demonstrate that our algorithm outperforms state-of-the-art sentiment analysis algorithms.

2. RELATED WORK

While domain adaptation methods for visual domains usually use generative adversarial networks (GANs) (Goodfellow et al., 2014) and align distributions indirectly, the dominant approach for cross-domain sentiment analysis is to design appropriate loss functions that directly impose domain alignment. The main reason is that natural language is expressed in terms of discrete values such as words, phrases, and sentences. Since this domain is not continuous, even if we convert natural language into real-valued vectors, it is not differentiable. Hence, adversarial learning procedure cannot be easily implemented for pure natural language processing (NLP) applications. Several alignment loss functions have been designed for cross-domain sentiment analysis. A group of methods are based on aligning the lower-order distributional moments, e.g., means and covariances, across the two domains, in an embedding space (Wu & Huang, 2016; Peng et al., 2018; Sarma et al., 2019; Guo et al., 2020) . An improvement over these methods is to use probability distribution metrics to consider the encoded information in higher order statistics (Shen et al., 2018) . Damodaran et al. (Bhushan Damodaran et al., 2018) demonstrated that using Wasserstein distance (WD) for domain alignment boosts the performance significantly in visual domain applications (Long et al., 2015; Sun & Saenko, 2016) . In the current work, we rely on the sliced Wasserstein distance (SWD) for aligning distribution. SWD has been used for domain adaptation in visual domains (Lee et al., 2019) . The major reason for performance degradation of a source-trained model in a target domain stems from "domain shift", i.e., the boundaries between the classes change in the embedding space even for related domains which in turn increases possibility of misclassification. It has been argued that if a max-margin classifier is trained in the source domain, it can generalize better than many methods that try to align distributions without further model adaptation (Tommasi & Caputo, 2013) . Inspired by the notion of "class prototypes", our method is based on both aligning distributions in the embedding space and also inducing larger margins between classes using the notion of "prototypical distributions". Recently, cross-domain alignment of the class prototypes has been used for domain adaptation (Pan et al., 2019; Chen et al., 2019) . The idea is that when a deep network classifier is trained in a domain with annotated data, data points of classes form separable clusters in an embedding space, modeled via network responses in hidden layers. A class prototype is defined as the mean of each class-specific data cluster in the embedding space. Domain adaptation then can be addressed by aligning the prototypes across the two domains as a surrogate for distributional alignment. Following the above, our work is based on using the prototypical distribution, rather simply the prototypes, to induce maximum margin between the class-specific clusters after an initial training phase in the source domain. Since the prototypical distribution is a multimodal distribution, we can estimate it using a Gaussian mixture model (GMM). We estimate the GMM using the source sample for which the classifier is confident and use random samples with high-confident labels to induce larger margins between classes, compared to using the original source domain data.

3. CROSS-DOMAIN SENTIMENT ANALYSIS

Consider two sentiment analysis problems in a source domain S with an annotated dataset D S = (X S , Y S ), where X S = [x s 1 , . . . , x s N ] ∈ X ⊂ R d×N and Y S = [y s 1 , ..., y s N ] ∈ Y ⊂ R k×N and a target domain T with an unannotated dataset D T = (X S ), where X T = [x t 1 , . . . , x t N ] ∈ X ⊂

