LOVASZ THETA CONTRASTIVE LEARNING

Abstract

We establish a connection between the Lovasz theta function of a graph and the widely used InfoNCE loss. We show that under certain conditions, the minima of the InfoNCE loss are related to minimizing the Lovasz theta function on the empty similarity graph between the samples. Building on this connection, we generalize contrastive learning on weighted similarity graphs between samples. Our Lovasz theta contrastive loss uses a weighted graph that can be learned to take into account similarities between our data. We evaluate our method on image classification tasks, demonstrating an improvement of 1% in the supervised case and up to 4% in the unsupervised case.

1. INTRODUCTION

The Lovasz theta function is a fundamental quantity in graph theory. It can be considered as the natural semidefinite relaxation of the graph independence number and was defined by Laszlo Lovasz to determine the Shannon capacity of the 5-cycle graph (Lovász, 1979) solving a problem that had been open in combinatorics for more than 20 years. This work subsequently inspired semidefinite approximation algorithms (Goemans & Williamson, 1995) and perfect graph theory (Berge, 2001) . The Lovasz theta function requires the computation of a graph representation: for a given undirected graph G(V, E) we would like to find unit norm vectors v i where i ∈ V , such that non-adjacent vertices have orthogonal representations: v T i v j = 0, if {i, j} / ∈ E. Every graph has such a representation, if the dimension of the vectors v is not constrained. The Lovasz theta function searches for a graph representation that makes all these vectors fit in a small spherical cap. 2021)). This training process aims to learn representations that have similar samples clustered together, while at the same time pulling different ones apart. This can be done in either an unsupervised fashion (i.e. without labels) or in a supervised way (Khosla et al., 2020) . Contrastive learning approaches typically consider similarity between elements to be binary -two samples are similar (positive) or different (negative). However, it is natural for some problems to consider variability in similarity: Images of cats are closer to dogs compared to airplanes, and this insight can benefit representation learning. Our Contributions: We establish a connection between contrastive learning and the Lovasz theta function. Specifically, we prove that the minimizers of the InfoNCE loss in the single positive case are the same (up to rotations) with those of the Lovasz theta optimum graph representation using an empty similarity graph. Using this connection, we generalize contrastive learning using Lovasz theta on weighted graphs (Johansson et al., 2015) . We define the Lovasz theta contrastive loss which leverages a weighted graph representing similarities between samples in each batch. Our loss is a generalization of the regular contrastive loss, since if positive examples are transformations of one sample and transformations of other images are used as negative examples (so the underlying graph corresponds to the empty one), we retrieve the regular constrastive loss. This way, any image similarity metric can be used to strengthen contrastive learning. For unsupervised contrastive learning, we show that our method can yield a benefit of up to 4% over SimCLR for CIFAR100 using a pre-trained CLIP image Figure 1 : Key idea of our method. In this figure we can see how our proposed method works, with respect to the similarity graph between the classes. In the case of regular supervised contrastive learning (on the left), the similarity graph considered is just the empty graph (no class is similar to another). On the right, we see our proposed method. While different, the classes of dogs are similar to each other and different from cats, so the graph considered by our method reflects that (edge boldness reflects weight magnitude). encoder to obtain similarities. For supervised contrastive learning (i.e. if class structure is used) our method yields a 1% benefit over supervised contrastive learning in CIFAR100 (Khosla et al., 2020) . The key idea of our method can be seen in Figure 1 , where we see that we want to connect two samples more strongly if they are semantically related. There exists a body of work regarding graph representation learning Chen et al. ( 2020a), which learns functions on a given graph so that the distance between the representations is preserved. While related, our work differs from this, since our contrastive learning loss does not seek to explicitly retain the structure of the graph, but rather use it to guide how aligned or unaligned the representations of some samples must be. In the rest of this work, we first examine the relationship between the Lovasz theta function and the InfoNCE loss. We then extend the latter via intuition derived from a weighted version of the Lovasz theta. Finally, we empirically evaluate our proposed loss, using several ways to model sample similarity.

2.1. LOVASZ THETA FUNCTION

The Lovasz theta function (Lovász, 1979 ) is a quantity which has been used to approximate the chromatic number of a graph. This is done by obtaining a representation for each vertex, which is clustered around a certain "handle" vector. One of the most important aspects of this function is that it is easily computable by solving an SDP (Gärtner & Matousek, 2012) , despite the fact that the chromatic number it is used to approximate is very difficult to calculate in general.

2.2. SELF-SUPERVISED CONTRASTIVE LEARNING

There has been a flurry of activity in self-supervision, e.g. (Chen et al., 2020b; He et al., 2020; Oord et al., 2018; Chen & He, 2021; Wu et al., 2018) . The main goal is to learn general representations that are good for a variety of downstream tasks. (Oord et al., 2018; He et al., 2020; Chen & He, 2021) extend this by maximizing mutual information, adding a momentum encoder or a stop-gradient to one copy of their network, respectively. There has also been work on obtaining features at multiple granularities of an image to improve the learnt representations Zhou et al. (2022) . Crucially, these works rely on large datasets of unlabeled data to learn quality representations that can then be applied to supervised learning tasks.



Ideally, we want to learn representations that are more general than those obtained by training classifiers. Wu et al. (2018) improved classification accuracy on ImageNet by a large margin over the state of the art. This was further simplified and improved by Chen et al. (2020b) by emphasizing the importance of data augmentations. More recent works

