

Abstract

Neighbor embeddings are a family of methods for visualizing complex highdimensional datasets using kNN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we empirically show that changing the balance between the attractive and the repulsive forces in t-SNE yields a spectrum of embeddings, which is characterized by a simple trade-off: stronger attraction can better represent continuous manifold structures, while stronger repulsion can better represent discrete cluster structures. We find that UMAP embeddings correspond to t-SNE with increased attraction; mathematical analysis shows that this is because the negative sampling optimisation strategy employed by UMAP strongly lowers the effective repulsion. Likewise, ForceAtlas2, commonly used for visualizing developmental single-cell transcriptomic data, yields embeddings corresponding to t-SNE with the attraction increased even more. At the extreme of this spectrum lies Laplacian Eigenmaps, corresponding to zero repulsion. Our results demonstrate that many prominent neighbor embedding algorithms can be placed onto this attraction-repulsion spectrum, and highlight the inherent trade-offs between them.

1. I

T-distributed stochastic neighbor embedding (t-SNE) (van der Maaten & Hinton, 2008) is arguably among the most popular methods for low-dimensional visualizations of complex high-dimensional datasets. It defines pairwise similarities called affinities between points in the high-dimensional space and aims to arrange the points in a low-dimensional space to match these affinities (Hinton & Roweis, 2003) . Affinities decay exponentially with high-dimensional distance, making them infinitesimal for most pairs of points and making the n × n affinity matrix effectively sparse. Efficient implementations of t-SNE suitable for large sample sizes n (van der Maaten, 2014; Linderman et al., 2019) explicitly truncate the affinities and use the k-nearest-neighbor (kNN) graph of the data with k n as the input. We use the term neighbor embedding (NE) to refer to all dimensionality reduction methods that operate on the kNN graph of the data and aim to preserve neighborhood relationships (Yang et al., 2013; 2014) et al., 2018; 2020; Wagner et al., 2018a; Tusi et al., 2018; Kanton et al., 2019; Sharma et al., 2020) . Here we provide a unifying account of these algorithms. We studied the spectrum of t-SNE embeddings that are obtained when increasing/decreasing the attractive forces between kNN graph neighbors, thereby changing the balance between attraction and repulsion. This led to a trade-off between faithful representations of continuous and discrete structures (Figure 1 ). Remarkably, we found that ForceAtlas2 and UMAP could both be accurately positioned on this spectrum (Figure 1 ). For UMAP, we used mathematical analysis and Barnes-Hut re-implementation to show that increased attraction is due to the negative sampling optimisation strategy. Insets show two subsets of digits separated in higher Laplacian eigenvectors. UMAP is very similar to ρ ≈ 4. ForceAtlas2 is very similar to ρ ≈ 30.

2. R

Various trade-offs in t-SNE generalizations have been studied previously (Yang et al., 2009; Kobak et al., 2020; Venna et al., 2010; Amid et al., 2015; Amid & Warmuth, 2019; Narayan et al., 2015; Im et al., 2018) , but our work is the first to study the exaggeration-induced trade-off. Prior work used 'early exaggeration' only as an optimisation trick (van der Maaten & Hinton, 2008) that allows to separate well-defined clusters (Linderman & Steinerberger, 2019; Arora et al., 2018) . Carreira-Perpinán (2010) introduced elastic embedding algorithm that has an explicit parameter λ controlling the attraction-repulsion balance. However, that paper suggests slowly increasing λ during optimization, as an optimisation trick similar to the early exaggeration, and does not discuss tradeoffs between high and low values of λ. Our results on UMAP go against the common wisdom on what makes UMAP perform as it does (McInnes et al., 2018; Becht et al., 2019) . No previous work suggested that negative sampling may have a drastic effect on the resulting embedding.

3. N

We first cast t-SNE, UMAP, and ForceAtlas2 in a common mathematical framework, using consistent notation and highlighting the similarities between the algorithms, before we investigate the relationships between them empirically and analytically in more detail. We denote the original high-dimensional points as x i and their low-dimensional positions as y i .

3.1. -SNE

T-SNE measures similarities between points by affinities v ij and normalized affinities p ij : p ij = v ij n , v ij = p i|j + p j|i 2 , p j|i = v j|i k =i v k|i , v j|i = exp - x i -x j 2 2σ 2 i . For fixed i, p j|i is a probability distribution over all points j = i (all p i|i are set to zero), and the variance of the Gaussian kernel σ 2 i is chosen to yield a pre-specified value of the perplexity



Figure 1: Attraction-repulsion spectrum for the MNIST data. Different embeddings of the full MNIST dataset of hand-written digits (n = 70 000); colors correspond to the digit value as shown in the t-SNE panel. Multiplying all attractive forces by an exaggeration factor ρ yields a spectrum of embeddings. Values below 1 yield inflated clusters. Small values above 1 yield more compact clusters.Higher values make multiple clusters merge, with ρ → ∞ corresponding to Laplacian Eigenmaps. Insets show two subsets of digits separated in higher Laplacian eigenvectors. UMAP is very similar to ρ ≈ 4. ForceAtlas2 is very similar to ρ ≈ 30.

