IDENTIFYING THE SOURCES OF UNCERTAINTY IN OBJECT CLASSIFICATION

Abstract

In image-based object classification, the visual appearance of objects determines which class they are assigned to. External variables that are independent of the object, such as the perspective or the lighting conditions, can modify the object's appearance resulting in ambiguous images that lead to misclassifications. Previous work has proposed methods for estimating the uncertainty of predictions and measure their confidence. However, such methods do not indicate which variables are the potential sources that cause uncertainty. In this paper, we propose a method for image-based object classification that uses disentangled representations to indicate which are the external variables that contribute the most to the uncertainty of the predictions. This information can be used to identify the external variables that should be modified to decrease the uncertainty and improve the classification.

1. INTRODUCTION

An object from the real world can be represented in terms of the data gathered from it through an observation/sensing process. These observations contain information about the properties of the object that allows their recognition, identification, and discrimination. In particular, one can obtain images from objects which represent its visual characteristics through photographs or rendering of images from 3D models. Image-based object classification is the task of assigning a category to images obtained from an object based on their visual appearance. The visual appearance of objects in an image is determined by the properties of the object itself (intrinsic variables) and the transformations that occur in the real world (extrinsic variables) (Kulkarni et al., 2015) . Probabilistic classifiers based on neural networks can provide a measure for the confidence of a model for a given prediction in terms of a probability distribution over the possible categories an image can be classified into. However, they do not indicate what variable contributes to the uncertainty. In some cases the extrinsic variables can affect the visual appearance of objects in images in such way that the predictions are highly uncertain. A measure of the uncertainty in terms of these extrinsic features can improve interpretability of the output of a classifier. Disentangled representation learning is the task of crating low-dimensional representations of data that capture the underlying variability of the data and in particular this variability can be explained in terms of the variables involved in data generation. These representations can provide interpretable data representations that can be used for different tasks such as domain adaptation (Higgins et al., 2017) ,continuous learning (Achille et al., 2018 ), noise removal (Lopez et al., 2018 ), and visual reasoning (van Steenkiste et al., 2019) . In this paper we propose a method for the identification of the sources of uncertainty in imagebased object classification with respect to the extrinsic features that affect the visual appearance of objects in images by using disentangled data representations. Given an image of an object, our model identifies which extrinsic feature contributes the most to the classification output and provides information on how to modify such feature to reduce the uncertainty in the predictions.

2. RELATED WORK

Achieving explainable results in predictive models is an important task, especially for critical applications in which the decisions can affect human life such as in health, security, law and defence Barredo Arrieta et al. (2020) . Even though deep neural networks provide successful results for image classification, their predictions can't be directly interpreted due to their complexity (Zhang & Zhu, 2018) . In order to solve this different approaches have been proposed to provide visual interpretability to the results such as identification of the image regions that contribute to classification (Selvaraju et al., 2016) . The uncertainty of predictions provides an extra level of interpretability to the predictions of a model by indicating the level of confidence in a prediction Kendall & Gal (2017) . There are different methods to introduce uncertainty measures in classifiers which include bayesian neural networks, ensembles, etc. Obtaining disentangled representations, that capture distinct sources of variation independently, is an important step towards interpretable machine learning systems Kim & Mnih (2018) . Despite the lack of agreement on the definition, one description states that a disentangled representation should separate the distinct, informative factors of variations in the data Bengio et al. (2012) . Within deep generative models, disentanglement is achieved by using neural networks that approximate a conditional distribution on the data and a set of unobserved latent variables. Particularly variational autoencoders (VAEs) Kingma & Welling (2014) In image-based object classification the variables that explain the visual characteristics of objects in data can be divided into those which represent the inherent properties of objects and those which represent its transformations. This explanation has been explored in Kulkarni et al. ( 2015) by describing the object's properties as the intrinsic variables and the properties that describe the object transformations as the extrinsic variables. Other work refers to similar sets of variables and their disentanglement under different names but representing similar concepts. Hamaguchi et al. (2019) disentangles the variables corresponding to ambient variables with respect to object identity information on images. (Gabbay & Hoshen, 2020) proposes the learning of disentangled representations that express the intra-class variability in terms of the class and content. (Detlefsen & Hauberg, 2019) proposes the disentanglement of the appearance and the perspective. Separate the identity of cars from their pose (Yang et al., 2015) .

3. SETTING

Consider a dataset of images that have been generated from the observations of an object and which should be classified into a certain category. We will assume that this category depends only on the properties of the object itself and not on its surroundings. We use a neural network as a probabilistic classifier to assign each of the images to a category. Usually the output of a neural network can't be directly interpreted in terms of the characteristics of the object have affected the confidence of the prediction. Disentanglement serves as a method to produce interpretable low-dimensional data representations that separate the variables that describe the properties of the objects and their surrounding. The main idea is that one can train a probabilistic classifier on disentangled low dimensional representations and identify which variables contribute to the uncertainty of the classification.

3.1. PROBABILISTIC CLASSIFIERS ON IMAGES

A probabilistic classifier is a model which outputs a conditional probability density P Y |x over the set of K ∈ N possible categories Y = {1, 2, . . . , K} conditioned on an input image x ∈ X. The value P Y |x (y) can be interpreted as the degree of confidence the model assigns for an image x ∈ X



are heavily favored due to their ability to model a joint distribution while maintaining scalability and training stability Higgins et al. (2016). Therefore most of the methods are based on augmentations on original VAE framework Higgins et al. (2016); Kim & Mnih (2018); Kulkarni et al. (2015); Mathieu et al. (2018) .

