TAKE 5: INTERPRETABLE IMAGE CLASSIFICATION WITH A HANDFUL OF FEATURES

Abstract

Deep Neural Networks use thousands of mostly incomprehensible features to identify a single class, a decision no human can follow. We propose an interpretable sparse and low dimensional final decision layer in a deep neural network with measurable aspects of interpretability and demonstrate it on fine-grained image classification. We argue that a human can only understand the decision of a machine learning model, if the features are interpretable and only very few of them are used for a single decision. For that matter, the final layer has to be sparse and -to make interpreting the features feasible -low dimensional. We call a model with a Sparse Low-Dimensional Decision "SLDD-Model". We show that a SLDD-Model is easier to interpret locally and globally than a dense high-dimensional decision layer while being able to maintain competitive accuracy. Additionally, we propose a loss function that improves a model's feature diversity and accuracy. Our more interpretable SLDD-Model only uses 5 out of just 50 features per class, while maintaining 97 % to 100 % of the accuracy on four common benchmark datasets compared to the baseline model with 2048 features.

1. INTRODUCTION

Understanding the decision of a deep learning model is becoming more and more important. Especially for safety-critical applications such as the medical domain or autonomous driving, it is often either legally (Bibal et al., 2021) or by the practitioners required to be able to trust the decision and evaluate its reasoning (Molnar, 2020) . Due to the high dimensionality of images, most previous work on interpretable models for computer vision combines the deep features computed by a deep neural network with a method that is considered interpretable, such as a prototype based decision tree (Nauta et al., 2021) . While approaches for measuring the interpretability without humans exist for conventional machine learning algorithms (Islam et al., 2020) , they are missing for methods including deep neural networks. In this work, we propose a novel sparse and low-dimensional SLDD-Model which offers measurable aspects of interpretability. The key aspect is a heavily reduced number of features, out of which only very few are considered per class. Humans can only consider 7 ± 2 aspects at once (Miller, 1956) and could therefore follow a decision that uses that many features. To



Figure 1: Local explanation by our SLDD-Model: The two features used for the predicted class, emerged without additional supervision, are aligned with human interpretable attributes and localized (described in App. D) adequately.

