LOGICAL VIEW ON FAIRNESS OF A BINARY CLASSIFICATION TASK

Abstract

Ethical, Interpretable/Explainable, and Responsible AI are an active area of research and important social initiative. Vendors offer solutions. For instance, Microsoft compiled a platform, Responsible AI. Within the context, challenges of algorithmic fairness and trustworthiness of machine learning are paramount. Furthermore, several authors argue that the emergence of algorithmically infused societies necessitates innovative approaches to measuring feasible information, e.g., collecting data shall follow a trustworthy social theory. In this paper, we show that this approach is heuristic at best. We prove that, with no regards to data, fairness and trustworthiness are algorithmically undecidable for a basic machine learning task, the binary classification. Therefore, even the approach based on not only improving but fully solving the three usually assumed issues -the insufficient quality of measurements, the complex consequences of (mis)measurements, and the limits of existing social theories -is only heuristics. We show that, effectively, the fairness of a classifier is not even a (version of bias-variance) trade-off inasmuch as it is a logical phenomenon. Namely, we reveal a language L and an L-theory T for binary classification task such that the very notion of loss is not expressible in the first-order logic formula in L.

1. INTRODUCTION

Ethical, Interpretable/Explainable, and Responsible AI are an active area of research and important social initiative. Vendors offer solutions. For instance, Microsoft compiled a platform, Responsible AI. Within the context, challenges of algorithmic fairness and trustworthiness of machine learning are paramount. Furthermore, several authors argue that the emergence of algorithmically infused societies necessitates innovative approaches to measuring feasible information, e.g., collecting data shall follow a trustworthy social theory [3] . Difficulties, associated with such approach, can be found in [7] . Moreover, in this paper, we show that this approach is heuristic at best. We prove that, with no regards to data, fairness and trustworthiness are algorithmically undecidable for a binary classification task (cf. [4], [5] ). Therefore, even the approach based on not only improving but fully solving the three usually assumed issues -the insufficient quality of measurements, the complex consequences of (mis)measurements, and the limits of existing social theories -is only heuristic. We prove that, effectively, the fairness of a binary classifier is not even a trade-off (e.g., a version of bias-variance/complexity etc.) inasmuch as it is a logical phenomenon. Namely, we reveal a language L and L-theory T for binary classification task such that the very notion of loss is not expressible in the first-order logic L-formula. Note that the essence of a "mass view" approach is that unlike in a traditional machine learning context, we are not making any assumptions on nature of a classifier loss, other than it should provide a way to compare two (potentially different) classifiers. Under this very broad perspective, it turns out that, in a natural model, the loss of a classifier is inexpressible as a first-order logic expression (cf. Appendix for the definitions). Without loss of generality, it follows that any feasible definition of fairness for machine learning classification task is undecidable. Indeed, one has to assume that two classifiers have to be comparable in their performance characteristics in the first place. If the latter is not expressible, then one cannot achieve a sensible conclusion on fairness. By the same token, since all derived heuristics such as transparency, interpretability and trust, must include a notion of fairness, More specifically, we present an almost surely decidable model where the classifier loss is not expressible. Thus, undecidability of a classifier loss is not necessarily associated with undecidability of the model. However, if we utilize yet another view of binary classifiers over an infinite domain, the class, viewed as a lower bounded lattice, is (first-order logic) undecidable. Throughout the paper, we consider the natural generalization of the binary classifier for the infinite domain. Our goal is to introduce a purely logical view on loss for a binary classifier on an infinite domain. It is achieved by introducing a general notion of classifier loss based on an observation that any natural loss is a first-order formula in a suitable structure. The latter has a theory T represented by a tuple L, M where L is a language and M is model for L. Next, we will show that the resulting first-order theory T admits an extension RG ext on random graph structure such that a notion that a graph has an equal number of connected and unconnected nodes is not expressible in the RG ext -first-order logic. The binary classifiers' structure is isomorphic (with probability 1 to T . Therefore, if a first-order sentence in one theory is deducible (i.e., can be proved) in that theory, the corresponding sentence is deducible in another. Then, for two given classifiers, assuming that the classifier loss is a first-order logic formula, C1 ≡ D, L c1 and C2 ≡ D, L c2 , we can construct a first-order expression L c1 -L c2 = 0 which is equivalent to an expression that the two classifiers have the same number of connected and unconnected nodes which leads to contradiction. This effectively means that any loss function is not expressible in the RG ext -first-order logic. The rest of the paper is dealing with the proof of these statements. It is interesting to compare this with a general undecidability of identities for wide class of functions in [1] . We conclude with discussion of losses expressible in second and higher logic theory and immediate implications for adopting them on fairness and interpretability (the extended version of the paper contains more information on each of these topics).

1.1. NOTATIONS AND DEFINITIONS

We will try to make this paper self-sufficient and provide all necessary references to the reader who would like to invest more time into the mathematical foundations of machine learning and interpretablity. We would need some information from model theory. We assume that the reader is familiar with the concepts of domain, classifier, and loss as well as the standard body of statistics and probability theory normally used in a supervised machine learning. Notations are natural; N denotes the set of natural numbers, Z stands for integers, and R denotes reals. R + would be positive reals. L or l normally stands for a loss unless it is a space which is then defined explicitly. S ∼ D means a sample from a distribution D; contextually, D can stand for a domain. In general, we assume an infinite countable domain. Traditionally, given a hypotheses space H and domain Z, loss l is a non-negative real function l : H × Z → R + . We denote L D (h) a standard expected loss of a binary classifier h ∈ H over domain X where H is a hypotheses space, with respect to a probability distribution D; by definition: L D (h) = E z∼D [l 0-1 (h, z)], and, since for 0-1 loss Z ranges over pairs, l 0-1 (h, (x, y)) = 0 if h(x) = y 1 if h(x) = y . We also need some definitions from model theory and logic. A filter α on the set of natural numbers N is a collection of sets of natural numbers obeying the following axioms: 1) If E ⊂F ⊂N and E ∈α then F ∈α; 2) If E ∈α and F ∈α then E ∩F ∈α; 3) Empty set ∅ / ∈ α.

