ADVERSARIAL REPRESENTATION LEARNING FOR CANONICAL CORRELATION ANALYSIS

Abstract

Canonical correlation analysis (CCA) provides a framework to map multimodality data into a maximally correlated latent space. The deep version of CCA has replaced linear maps with deep transformations to enable more flexible correlated data representations; however, optimization for the CCA target requires calculation on sufficiently large sample batches. Here, we present a deep, adversarial approach to CCA, adCCA, that can be efficiently solved by standard mini-batch training. We reformulate CCA under the assumption that the different modalities are embedded with identical latent distributions, derive a tractable deep CCA target. We implement the new target and distribution constraint with an adversarial framework to efficiently learn the canonical representations. adCCA learns maximally correlated representations across multimodalities meanwhile preserves class information within individual modalities. Further, adCCA removes the need for feature transformation and normalization and can be directly applied to diverse modalities and feature encodings. Numerical studies show that the performance of adCCA is robust to data transformations, binary encodings, and corruptions. Together, adCCA provides a scalable approach to align data across modalities without compromising sample class information within each modality.

1. INTRODUCTION

Data samples can be measured with different modalities (e.g., image or text), encoded in different formats, and modeled by different distributions. Integrative analysis of multimodality data provides the opportunity in many machine learning tasks to combine partial information from each modality and achieve better performance than any single modality alone (Ngiam et al. (2011); Srivastava & Salakhutdinov (2012) ). Canonical correlation analysis (CCA) (Thompson (1984) ) is one of the most classical and general approaches for multimodality data integration. It learns linear mappings between data modalities that achieve maximal cross-modality correlations. Replacing the linear mappings in CCA with deep functions can achieve non-linear and flexible transformations and provide better correlated representations. However, learning the deep CCA target function requires optimization over all data samples (Andrew et al. ( 2013)) or a sufficiently large data batch (Wang et al. ( 2015)), which is incompatible with standard batch-based learning strategies widely used in deep learning and limits its power on large-scale datasets. Therefore, many recent deep CCA approaches (Wang et al. (2016) ; Dutton (2020); Karami & Schuurmans (2021)) sidestep the original CCA formulation and rather focus on learning a joint representation for the paired modalities using approaches that are compatible with mini-batch training (Appendix A). Here, we propose a multimodal adversarial learning framework for deep CCA learning: adCCA. Mathematical analysis provides an optimization target for CCA amenable to mini-batch training under the requirement that the different modality distributions are identical in latent space. adCCA formulates this requirement as a penalty function that, during optimization, brings the two latent distributions into alignment. As the latent space representations converge (distribution-wise), maximizing the optimization target leads to highly correlated latent representations (sample-wise) for the two modalities, as illustrated with numerical experiments. Thus, adCCA is derived from a deep CCA framework that follows the original correlation target of CCA, yet can be directly optimized by mini-batch training.

