LOVASZ THETA CONTRASTIVE LEARNING

Abstract

We establish a connection between the Lovasz theta function of a graph and the widely used InfoNCE loss. We show that under certain conditions, the minima of the InfoNCE loss are related to minimizing the Lovasz theta function on the empty similarity graph between the samples. Building on this connection, we generalize contrastive learning on weighted similarity graphs between samples. Our Lovasz theta contrastive loss uses a weighted graph that can be learned to take into account similarities between our data. We evaluate our method on image classification tasks, demonstrating an improvement of 1% in the supervised case and up to 4% in the unsupervised case.

1. INTRODUCTION

The Lovasz theta function is a fundamental quantity in graph theory. It can be considered as the natural semidefinite relaxation of the graph independence number and was defined by Laszlo Lovasz to determine the Shannon capacity of the 5-cycle graph (Lovász, 1979) solving a problem that had been open in combinatorics for more than 20 years. This work subsequently inspired semidefinite approximation algorithms (Goemans & Williamson, 1995) and perfect graph theory (Berge, 2001) . The Lovasz theta function requires the computation of a graph representation: for a given undirected graph G(V, E) we would like to find unit norm vectors v i where i ∈ V , such that non-adjacent vertices have orthogonal representations: v T i v j = 0, if {i, j} / ∈ E. Every graph has such a representation, if the dimension of the vectors v is not constrained. The Lovasz theta function searches for a graph representation that makes all these vectors fit in a small spherical cap. 2021)). This training process aims to learn representations that have similar samples clustered together, while at the same time pulling different ones apart. This can be done in either an unsupervised fashion (i.e. without labels) or in a supervised way (Khosla et al., 2020) . Contrastive learning approaches typically consider similarity between elements to be binary -two samples are similar (positive) or different (negative). However, it is natural for some problems to consider variability in similarity: Images of cats are closer to dogs compared to airplanes, and this insight can benefit representation learning. Our Contributions: We establish a connection between contrastive learning and the Lovasz theta function. Specifically, we prove that the minimizers of the InfoNCE loss in the single positive case are the same (up to rotations) with those of the Lovasz theta optimum graph representation using an empty similarity graph. Using this connection, we generalize contrastive learning using Lovasz theta on weighted graphs (Johansson et al., 2015) . We define the Lovasz theta contrastive loss which leverages a weighted graph representing similarities between samples in each batch. Our loss is a generalization of the regular contrastive loss, since if positive examples are transformations of one sample and transformations of other images are used as negative examples (so the underlying graph corresponds to the empty one), we retrieve the regular constrastive loss. This way, any image similarity metric can be used to strengthen contrastive learning. For unsupervised contrastive learning, we show that our method can yield a benefit of up to 4% over SimCLR for CIFAR100 using a pre-trained CLIP image

