Untangle: Critiquing Disentangled Recommendations

Abstract

The core principle behind most collaborative filtering methods is to embed users and items in latent spaces, where individual dimensions are learned independently of any particular item attributes. It is thus difficult for users to control their recommendations based on particular aspects (critiquing). In this work, we propose Untangle: a recommendation model that gives users control over the recommendation list with respect to specific item attributes, (e.g.:less violent, funnier movies) that have a causal relationship in user preferences. Untangle uses a refined training procedure by training (i) a (partially) supervised β-VAE that disentangles the item representations and (ii) a second phase which optimized to generate recommendations for users. Untangle gives control on critiquing recommendations based on users preferences, without sacrificing on recommendation accuracy. Moreover only a tiny fraction of labeled items is needed to create disentangled preference representations over attributes.

1. Introduction

As most standard recommendation models solely aim at increasing the performance of the system, no special care is taken to ensure interpretability of the user and item representations. These representations do not explicitly encode user preferences over item attributes. Hence, they cannot be easily used by users to change a.k.a. critique (5) the recommendations. For instance, a user in a recipe recommendation system cannot ask for recommendations for a set of less spicy recipes, as the spiciness is not explicitly encoded in the latent space. Moreover the explainability of the recommendations that are provided by such systems is very limited. In this work, we enrich a state-of-the-art recommendation model to explicitly encode preferences over item attributes in the user latent space while simultaneously optimizing for recommendation's performance. Our work is motivated by disentangled representations in other domains, e.g., manipulating generative models of images with specific characteristics ( 6) or text with certain attributes (7). Variational Autoencoders (VAEs), particularly β-VAE's (8) (which we adapt here), are generally used to learn these disentangled representations. Intuitively, they optimize embeddings to capture meaningful aspects of users and items independently. Consequently, such embeddings will be more usable for critiquing. There are two types of disentangling β-VAEs: unsupervised and supervised. In the former, the representations are disentangled to explanatory factors of variation in an unsupervised manner, i.e., without assuming additional information on the existence (or not) of specific aspects. Used in the original β-VAE (8) approach, a lack of supervision often results in inconsistency and instability in disentangled representations (9). In contrast, in supervised disentangling, a small subset of data is assumed to have side-information (i.e. a label or a tag). This small subset is then used to disentangle into meaningful factors (10; 9). As critiquing requires user control using familiar terms/attributes, we incorporate supervised disentanglement in a β-VAE architecture in this work. To achieve the explicit encoding of preferences over item attributes in embedding space we refine the training strategy of the untangle model. We essentially train in two phases: i) Disentangling phase: We explicitly disentangle item representations, using very few supervised labels. ii) Recommendation phase: We encode the user, using the bag-of-words representation of the items interacted, and then generate the list of recommended items. Untangle gives fine-grained control over the recommendations across various item attributes, as compared to the baseline. We achieve this with a tiny fraction of attribute labels over items, and moreover achieve comparable recommendation performance compared to state-of-the-art baselines.

2. Related Work

Deep learning based Autoencoder architectures are routinely used in collaborative filtering and recommendation models (11; 12; 13) . In particular (11; 12) adopt denoising autoencoder architectures, whereas (13) uses variational autoencoders. The internal (hidden) representations generated by the encoders in these models are not interpretable and hence cannot be used for critiquing or explanations in recommendations. Recent work on Variational Autoencoders across domains have focused on the task of generating disentangled representations. One of the first approaches used to that end was β-VAE (8; 14; 15), which essentially enforced a stronger (multiplying that term with β > 1) KL divergence constraint on the VAE objective. Such representations are more controllable and interpretable as compared to VAEs. One of the drawbacks of β-VAE is that the disentanglement of the factors cannot be controlled and that they are relatively unstable and not easy to reproduce particularly when the factors of variance are subtle (9; 8; 14; 16; 17) . This has motivated methods that explicitly supervise the disentangling (10), that rely either on selecting a good set of disentangling using multiple runs and the label information (18), or by adding a supervised loss function in the β-VAE objective function (10). As supervised disentangling methods are better in explainability and could provide control over desired attributes, we motivate our model from (19) for better critiquing in VAE based recommendation systems. In recommender systems similar methods to utilize side information, have also been used recently to allow for models that enable critiquing of recommendations. These models allow users to tune the recommendations across some provided attributes/dimensions. Notable examples are (20; 21), where the models are augmented with a classifier of the features over which to control the recommendation. Adjusting the features at the output of the classifier modifies the internal hidden state of the model and leads to recommendations that exhibit or not the requested attribute. Note that this method of critiquing is quite different to our approach which allows for a gradual adjustment of the attributtes. Moreover the models in (20; 21) require a fully labeled dataset with respect to the attributes while our approach only requires a small fraction of labeled data.



Figure 1: Untangle model is trained in two phases: Disentangling phase: Input to encoder is a one hot representation of an item (green dotted line). Obtained representation is disentangled across A attributes. Recommendation phase: Input to encoder is the items user interacted with (solid red line) and recommends new items.

