EXPRESSIVE MONOTONIC NEURAL NETWORKS

Abstract

The monotonic dependence of the outputs of a neural network on some of its inputs is a crucial inductive bias in many scenarios where domain knowledge dictates such behavior. This is especially important for interpretability and fairness considerations. In a broader context, scenarios in which monotonicity is important can be found in finance, medicine, physics, and other disciplines. It is thus desirable to build neural network architectures that implement this inductive bias provably. In this work, we propose a weight-constrained architecture with a single residual connection to achieve exact monotonic dependence in any subset of the inputs. The weight constraint scheme directly controls the Lipschitz constant of the neural network and thus provides the additional benefit of robustness. Compared to currently existing techniques used for monotonicity, our method is simpler in implementation and in theory foundations, has negligible computational overhead, is guaranteed to produce monotonic dependence, and is highly expressive. We show how the algorithm is used to train powerful, robust, and interpretable discriminators that achieve competitive performance compared to current state-of-the-art methods across various benchmarks, from social applications to the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider.

1. INTRODUCTION

The need to model functions that are monotonic in a subset of their inputs is prevalent in many ML applications. Enforcing monotonic behaviour can help improve generalization capabilities (Milani Fard et al., 2016; You et al., 2017) and assist with interpretation of the decision-making process of the neural network (Nguyen & Martínez, 2019) . Real world scenarios include various applications with fairness, interpretability, and security aspects. Examples can be found in the natural sciences and in many social applications. Monotonic dependence of a model output on a certain feature in the input can be informative of how an algorithm works-and in some cases is essential for realword usage. For instance, a good recommender engine will favor the product with a high number of reviews over another with fewer but otherwise identical reviews (ceteris paribus). The same applies for systems that assess health risk, evaluate the likelihood of recidivism, rank applicants, filter inappropriate content, etc. In addition, robustness to small perturbations in the input is a desirable property for models deployed in real world applications. In particular, when they are used to inform decisions that directly affect human actors-or where the consequences of making an unexpected and unwanted decision could be extremely costly. The continued existence of adversarial methods is a good example for the possibility of malicious attacks on current algorithms (Akhtar et al., 2021) . A natural way of ensuring the robustness of a model is to constrain its Lipschitz constant. To this end, we recently developed an architecture whose Lipschitz constant is constrained by design using layer-wise normalization which allows the architecture to be more expressive than the current state-of-the-art with stable and fast training (Kitouni et al., 2021) . Our algorithm has been adopted to classify the decays of subatomic particles produced at the CERN Large Hadron Collider in the real-time data-processing system of the LHCb experiment, which was our original motivation for developing this novel architecture.

