

Abstract

Neighbor embeddings are a family of methods for visualizing complex highdimensional datasets using kNN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we empirically show that changing the balance between the attractive and the repulsive forces in t-SNE yields a spectrum of embeddings, which is characterized by a simple trade-off: stronger attraction can better represent continuous manifold structures, while stronger repulsion can better represent discrete cluster structures. We find that UMAP embeddings correspond to t-SNE with increased attraction; mathematical analysis shows that this is because the negative sampling optimisation strategy employed by UMAP strongly lowers the effective repulsion. Likewise, ForceAtlas2, commonly used for visualizing developmental single-cell transcriptomic data, yields embeddings corresponding to t-SNE with the attraction increased even more. At the extreme of this spectrum lies Laplacian Eigenmaps, corresponding to zero repulsion. Our results demonstrate that many prominent neighbor embedding algorithms can be placed onto this attraction-repulsion spectrum, and highlight the inherent trade-offs between them.

1. I

T-distributed stochastic neighbor embedding (t-SNE) (van der Maaten & Hinton, 2008) is arguably among the most popular methods for low-dimensional visualizations of complex high-dimensional datasets. It defines pairwise similarities called affinities between points in the high-dimensional space and aims to arrange the points in a low-dimensional space to match these affinities (Hinton & Roweis, 2003) . Affinities decay exponentially with high-dimensional distance, making them infinitesimal for most pairs of points and making the n × n affinity matrix effectively sparse. Efficient implementations of t-SNE suitable for large sample sizes n (van der Maaten, 2014; Linderman et al., 2019) explicitly truncate the affinities and use the k-nearest-neighbor (kNN) graph of the data with k n as the input. We use the term neighbor embedding (NE) to refer to all dimensionality reduction methods that operate on the kNN graph of the data and aim to preserve neighborhood relationships (Yang et al., 2013; 2014) et al., 2018; 2020; Wagner et al., 2018a; Tusi et al., 2018; Kanton et al., 2019; Sharma et al., 2020) . Here we provide a unifying account of these algorithms. We studied the spectrum of t-SNE embeddings that are obtained when increasing/decreasing the attractive forces between kNN graph neighbors, thereby changing the balance between attraction and repulsion. This led to a trade-off between faithful representations of continuous and discrete structures (Figure 1 ). Remarkably, we found that ForceAtlas2 and UMAP could both be accurately positioned on this spectrum (Figure 1 ). For UMAP, we used mathematical analysis and Barnes-Hut re-implementation to show that increased attraction is due to the negative sampling optimisation strategy.



. A prominent recent example of this class of algorithms is UMAP(McInnes et al., 2018), which has become popular in applied fields such as single-cell transcriptomics(Becht et al., 2019). It is based on stochastic optimization and typically produces more compact clusters than t-SNE.

