UNSUPERVISED MANIFOLD ALIGNMENT WITH JOINT MULTIDIMENSIONAL SCALING

Abstract

We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional Euclidean space. Our approach integrates Multidimensional Scaling (MDS) and Wasserstein Procrustes analysis into a joint optimization problem to simultaneously generate isometric embeddings of data and learn correspondences between instances from two different datasets, while only requiring intra-dataset pairwise dissimilarities as input. This unique characteristic makes our approach applicable to datasets without access to the input features, such as solving the inexact graph matching problem. We propose an alternating optimization scheme to solve the problem that can fully benefit from the optimization techniques for MDS and Wasserstein Procrustes. We demonstrate the effectiveness of our approach in several applications, including joint visualization of two datasets, unsupervised heterogeneous domain adaptation, graph matching, and protein structure alignment. The implementation of our work is available at https://github.com/BorgwardtLab/JointMDS.

1. INTRODUCTION

Many problems in machine learning require joint visual exploration and manipulation of multiple datasets from different (heterogeneous) domains, which is generally a preferable first step prior to any further data analysis. These different data domains may consist of measurements for the same samples obtained with different methods or technologies, such as single-cell multi-omics data in bioinformatics (Demetci et al., 2022; Liu et al., 2019; Cao & Gao, 2022) . Alternatively, the data could be comprised of different datasets of similar objects, such as word spaces of different languages in natural language modeling (Alvarez-Melis et al., 2019; Grave et al., 2019) , or graphs representing related objects such as disease-procedure recommendation in biomedicine (Xu et al., 2019b) . There are two main challenges in joint exploration of multiple datasets. First, the data from the heterogeneous domains may be high-dimensional or may not possess input features but rather only dissimilarities between them. Second, the correspondences between data instances across different domains may not be known a priori. We propose in this work to tackle both issues simultaneously while making few assumptions on the data modality. To address the first challenge, for several decades many dimensionality reduction methods have been proposed to provide lower-dimensional embeddings of data. Among them, multidimensional scaling (MDS) (Borg & Groenen, 2005) and its extensions (Tenenbaum et al., 2000; Chen & Buja, 2009) are widely used ones. They generate low-dimensional data embeddings while preserving the local and global information about its manifold structure. One of the key characteristics of MDS is the fact that it only requires pairwise (dis)similarities as input rather than specific data features, which makes it applicable to problems whose data does not have access to the input features, such as graph node embedding learning (Gansner et al., 2004) . However, when it comes to dealing with multiple datasets from different domains at the same time, the subspaces learned by MDS for different datasets are not naturally aligned, making it not directly applicable for a joint exploration. One well-known method for aligning data instances from different spaces is Procrustes analysis. When used together with dimensionality reduction, it results in a manifold alignment method (Wang & Mahadevan, 2008; Kohli et al., 2021; Lin et al., 2021) . However, these approaches require prior knowledge about the correspondences between data instances across the domains, which limits their applicability in many real-world applications where this information is hard or expensive to obtain. Unsupervised manifold alignment approaches (Wang & Mahadevan, 2009; Cui et al., 2014) have been proposed to overcome this limitation by aligning the underlying manifold structures of two datasets with unknown correspondences while projecting data onto a common low-dimensional space. In this work, we propose to combine MDS with the idea of unsupervised manifold alignment to simultaneously embed data instances from two domains without known correspondences to a common low-dimensional space, while only requiring intra-dataset dissimilarities. We formulate the problem as a joint optimization problem, where we integrate the stress functions for each dataset that measure the distance deviations and adopt the idea of Wasserstein Procrustes analysis (Alvarez-Melis et al., 2019) to align the embedded data instances from two datasets in a fully unsupervised manner. We propose to solve the resulting optimization problem through an alternating optimization strategy, resulting in an algorithm that can benefit from the optimization techniques for solving each individual sub-problem. Our approach, named Joint MDS, allows recovering the correspondences between instances across domains while also producing aligned low-dimensional embeddings for data from both domains, which is the main advantage compared to Gromov-Wasserstein (GW) optimal transport (Mémoli, 2011; Yan et al., 2018) for only correspondence finding. We show the effectiveness of joint MDS in several machine learning applications, including joint visualization of two datasets, unsupervised heterogeneous domain adaptation, graph matching, and protein structure alignment.

2. RELATED WORK

We present here the work most related to ours, namely MDS, unsupervised manifold alignment and optimal transport (OT) for correspondence finding. Multidimensional scaling and extensions MDS is one of the most commonly used dimensionality reduction methods that only require pairwise (dis)similarities between data instances as input. Classical MDS (Torgerson, 1965) was introduced under the assumption that the dissimilarity is an Euclidean distance, which has an analytic solution via SVD. As an extension of classic MDS, metric MDS consists in learning low-dimensional embeddings that preserve any dissimilarity by minimizing a stress function. Several extensions of MDS have also been proposed for various practical reasons, such as non-metric MDS (Agarwal et al., 2007) , Isomap (Tenenbaum et al., 2000) , local MDS (Chen & Buja, 2009) and so on. MDS has also been used for graph drawing (Gansner et al., 2004 ) by producing node embeddings using shortest path distances on the graph. Our approach can be seen as an important extension of MDS to work with multiple datasets. Unsupervised manifold alignment While (semi-)supervised manifold alignment methods (Ham et al., 2005; Wang & Mahadevan, 2008; Shon et al., 2005) require at least partial information about the correspondence across domains, unsupervised manifold alignment learns the correspondence directly from the underlying structures of the data. One of the earliest works for unsupervised manifold alignment was presented in (Wang & Mahadevan, 2009) , where a similarity metric based on the permutation of the local geometry was used to find cross-domain corresponding instances followed by a non-linear dimensionality reduction. A similar approach was also adopted in (Tuia et al., 2014) with a graph-based similarity metric. A more generalized framework, named GUMA (Cui et al., 2014) , was proposed as an optimization problem with three complex terms to project data instances via a linear transformation and match them simultaneously. As an extension of (Cui et al., 2014) , UnionCom (Cao et al., 2020) introduced geodesic distance matching instead of the kernel matrices, to deal with multi-model data in bioinformatics in particular. Additionally, generative adversarial networks and the maximum mean discrepancy have also been used to find correspondences jointly with dimensionality reduction for unsupervised manifold alignment (Amodio & Krishnaswamy, 2018; Liu et al., 2019) . Our approach differs from these previous approaches as it makes few assumptions on the data modality but only requires intra-dataset dissimilarities as input. Optimal transport for correspondence finding OT (Peyré et al., 2019) is a powerful and flexible approach to compare two distributions, and has a strong theoretical foundation. It can find a soft-correspondence mapping between two sets of samples without any supervision. With the

