AN EXACT POLY-TIME MEMBERSHIP-QUERIES AL-GORITHM FOR EXTRACTING A THREE-LAYER RELU NETWORK

Abstract

We consider the natural problem of learning a ReLU network from queries, which was recently remotivated by model extraction attacks. In this work, we present a polynomial-time algorithm that can learn a depth-two ReLU network from queries under mild general position assumptions. We also present a polynomial-time algorithm that, under mild general position assumptions, can learn a rich class of depth-three ReLU networks from queries. For instance, it can learn most networks where the number of first layer neurons is smaller than the dimension and the number of second layer neurons. These two results substantially improve state-of-the-art: Until our work, polynomial-time algorithms were only shown to learn from queries depth-two networks under the assumption that either the underlying distribution is Gaussian (Chen et al. ( 2021)) or that the weights matrix rows are linearly independent (Milli et al. (2019)). For depth three or more, there were no known poly-time results.

1. INTRODUCTION

With the growth of neural-network-based applications, many commercial companies offer machine learning services, allowing public use of trained networks as a black-box. Those networks allow the user to query the model and, in some cases, return the exact output of the network to allow the users to reason about the model's output. Yet, the parameters of the model and its architecture are considered the companies' intellectual property, and they do not often wish to reveal it. Moreover, sometimes the training phase uses sensitive data, and as demonstrated in Zhang et al. (2020) , inversion attacks can expose those sensitive data to one who has the trained model. Nevertheless, the model is still vulnerable to membership query attacks even as a black box. A recent line of works (Tramer et al. (2016 ), Shi et al. (2017 ), Milli et al. (2019 ), Rolnick & Körding (2020 ), Carlini et al. (2020 ), Fornasier et al. (2021) ) showed either empirically or theoretically that using a specific set of queries, one can reconstruct some hidden models. Theoretical work includes Chen et al. ( 2021) that proposed a novel algorithm that, under the Gaussian distribution, can approximate a two-layer model with ReLU activation in a guaranteed polynomial time and query complexity without any further assumptions on the parameters. Likewise, Milli et al. (2019) has shown how to exactly extract the parameters of depth-two networks, assuming that the weight matrix has independent rows (in particular, the number of neurons is at most the input dimension). Our work extends their work by showing: 1. A polynomial time and query complexity algorithm for exact reconstruction of a two-layer neural network with any number of hidden neurons, under mild general position assumptions; and 2. A polynomial time and a query complexity algorithm for exact reconstruction of a threelayer neural network under mild general position assumptions, with the additional assumptions that the number of first layer neurons is smaller than the input dimension and the assumption that the second layer has non-zero partial derivatives. The last assumption is valid for most networks with more second layer neurons than first layer neurons. The mild general position assumptions are further explained in section 3.3. However, we note that the proposed algorithm will work on any two-layer neural network except for a set with a zero Lebesgue measure. Furthermore, it will work in polynomial time provided that the input weights are slightly perturbed (for instance, each weight is perturbed by adding a uniform number in [-2 -d , 2 -d ]) At a very high level, the basis of our approach is to find points in which the linearity of the network breaks and extract neurons by recovering the affine transformations computed by the network near these points. This approach was taken by the previous theoretical papers Milli et al. (2019) . In order to derive our results, we add several ideas to the existing techniques, including the ability to distinguish first from second layer neurons, which allows us to deal with three-layer networks, as well as the ability to reconstruct the neurons correctly in general depth-two networks with any finite width in a polynomial time, without assuming that the rows are independent.

2. RESULTS

We next describe our results. Our results will assume a general position assumption quantified by a parameter δ ∈ (0, 1), and a network that satisfies our assumption with parameter δ will be called δ-regular. This assumption is defined in section 3.3. We note, however, that a slight perturbation of the network weights, say, adding to each weight a uniform number in [-2 -d , 2 -d ], guarantees that w.p. 1 -2 -d the network will be δ-regular with δ that is large enough to guarantee polynomial time complexity. Thus, δ-regularity is argued to be a mild general position assumption. Throughout the paper, we denote by Q the time it takes to make a single query.

2.1. DEPTH TWO NETWORKS

Consider a 2-layer network model given by M(x) = d1 j=1 u j ϕ (⟨w j , x⟩ + b j ) where ϕ(x) = x + = max(x, 0) is the ReLU function, and for any j ∈ [d 1 ], w j ∈ R d , b j ∈ R, and u j ∈ R. We assume that the w j 's, the b j 's and the u j 's, along with the width d 1 , are unknown to the user, which has only black box access to M(x), for any x ∈ R d . We do not make any further assumptions on the network weights, rather than δ-regularity. Theorem 1. There is an algorithm that given an oracle access to a δ-regular network as in equation 1, reconstructs it using O (d 1 log(1/δ) + d 1 d) Q + d 2 d 1 time and O (d 1 log(1/δ) + d 1 d) queries. We note that by reconstruction we mean that the algorithm will find d ′ 1 and weights w ′ 0 , . . . , w ′ d ′ 1 ∈ R d , b ′ 0 , . . . , b ′ d ′ 1 ∈ R, and u ′ 1 , . . . , u ′ d ′ 1 ∈ R such that ∀x ∈ R d , M(x) = ⟨w ′ 0 , x⟩ + b ′ 0 + d ′ 1 j=1 u ′ j ϕ w ′ j , x + b ′ j . We will also prove a similar result for the case that the algorithm is allowed to query the network just on points in R d + , but on the other hand, equation equation 2 needs to be satisfied just for x ∈ R d + . This case is essential for reconstructing depth-three networks, and we will call it the R d + -restricted case. Theorem 2. In the R d + -restricted case there is an algorithm that given an oracle access to a δregular network as in equation 1, reconstructs it using O (dd 1 log(1/δ) + d 1 d) Q + d 2 d 2 1 time and O (dd 1 log(1/δ) + d 1 d) queries.



(2019); Chen et al. (2021) and also in the empirical works of Carlini et al. (2020); Jagielski et al.

