SIGNS IN THE LOTTERY: STRUCTURAL SIMILARITIES BETWEEN WINNING TICKETS

Abstract

Winning tickets are sparse subnetworks of a deep network that can be trained in isolation to the same performance as the full network. Winning tickets have been found in many different contexts, however, their structural characteristics are not well understood. We propose that the signs of the connections in winning tickets play a crucial role. We back this claim by introducing a sign-based structural comparison metric that allows distinguishing winning tickets from other sparse networks. We further analyze typical (signed) patterns in convolutional kernels of winning tickets and find structures that resemble patterns found in trained networks.

1. INTRODUCTION

The lottery ticket hypothesis (Frankle & Carbin, 2019) claims the existence of sparse trainable subnetworks in a given deep network, the so-called winning tickets. While it has been known for a long time that artificial neural networks can be pruned significantly (removing more than 90% of the parameters) after training without impacting their performance (Liang et al., 2021; Blalock et al., 2020) , such sparse networks usually resist training when their weights are initialized randomly, meaning they need longer to converge and usually reach lower performance. One interpretation of winning tickets is that they form the trainable core of a dense network, with the other parameters essentially being dead weights. This overparameterization of the dense network is still useful, as it combinatorially expands the number of subnetworks and hence the chance of containing a wining ticket. Winning tickets have been reliably found for different types of deep networks by pruning connections from randomly initialized dense networks. However, the known pruning approaches are quite laborious, often requiring more resources than simply training the original dense network. A good characterization of what distinguishes winning tickets from other sparse networks still seems to be missing. Better understanding these properties could not only help in designing more efficient pruning algorithms, but it might also allow devising new initialization schemes for deep networks, leading to smaller networks and more efficient training. In this paper, we address the question of whether winning tickets show specific structural characteristics that allow us to distinguish them from other sparse networks. We claim that it is not merely their sparse structure but also the sign of the connections that should be considered. After briefly reviewing previous work and motivating our approach (section 2), we introduce a sign-aware structural distance metric for sparse networks (section 3), explain our experimental setup (section 4) and apply our metric to the generated winning tickets. We then complement this quantitative analysis with a more qualitative inspection of spatial structures found in winning tickets (section 6) and conclude by summarizing our findings (section 7).

2. RELATED WORK

Since their discovery by Frankle & Carbin (2019), winning tickets have attracted a lot of attention and several works have shown their existence for different types of networks and datasets, although for larger networks, some additional warmup training seems to be required (Frankle et al., 2019) . On the other hand, Ramanujan et al. (2020) have shown that there exist subnetworks that already

