INTERPRETABLE RELATIONAL REPRESENTATIONS FOR FOOD INGREDIENT RECOMMENDATION SYSTEMS Anonymous authors Paper under double-blind review

Abstract

Supporting chefs with ingredient recommender systems to create new recipes is challenging, as good ingredient combinations depend on many factors like taste, smell, cuisine style, texture among others. There have been few attempts to address these issues using machine learning. Useful Machine Learning models do obviously need to be accurate but importantly -especially for food professionals -interpretable. In order to address these issues, we propose the Interpretable Relational Representation Model (IRRM). The main component of the model is a key-value memory network to represent relationships of ingredients. We propose and test two variants of the model. One can learn latent relational representations over a trainable memory network (Implicit model), and the other can learn explainable relational representations over a pre-trained memory network that integrates an external knowledge base (Explicit model). The relational representations resulting from the model are interpretable -they allow to inspect why certain ingredient pairings have been suggested. The Explicit model additionally allows to integrate any number of manually specified constraints. We conduct experiments on two recipe datasets, including CulinaryDB with 45,772 recipes and Flavornet with 55,001 recipes, respectively. The experimental results show that our models are both predictive and informative.

1. INTRODUCTION

Data mining and machine learning methods play an increasingly prominent role in food preference modeling, food ingredient pairing discovery and new recipe generation. Solving these tasks is nontrivial, since the goodness of ingredient combinations depends on many factors like taste, smell, cuisine, texture, and culture. Ahn et al. (2011) detected that the number of shared flavor molecules between ingredients is one of important factors for food pairing. They found Western cuisines show a tendency to use ingredient pairs that share many flavor compounds, while East Asian cuisines tend to avoid compound sharing ingredients. Using this idea, Garg et al. ( 2017) developed a rule-based food pairing system which ranks ingredients based on the number of shares of flavor molecules. Recently, Park et al. ( 2019) suggested a neural network approach based on flavor molecules and co-occurrence of ingredients in recipes. These approaches focus on one-to-one food pairing. There is also research related to many-to-one pairing. De Clercq et al. (2016) proposed the Recipe Completion Task which tries to identify matching ingredients for a partial list of ingredients (the recipe) using a Matrix Factorization based recommender system. Although efforts have been made to detect good ingredient combinations, there is no current Machine Learning method in this field that allows to interpret why suggested pairs are good. Our work is targeted at interpretable recommendation systems for food pairing and recipe completion. Given a set of pre-selected ingredients (cardinality 1 or more) by a user, the recommender suggests top-N ingredients from a set of candidates. For example, suppose a user selects apple and chocolate as the pre-selected ingredients, our recommender suggests some good paired ingredients (e.g. cinnamon) and also identifies reasons (e.g. cinnamon is good for apple and chocolate in terms of their flavor affinity). For this, we propose the Interpretable Relational Representations Model (IRRM) in two variants to address food pairing and recipe completion tasks. The model features a key-value memory network (Sukhbaatar et al. (2015 ), Miller et al. (2016) ) to represent relationships of ingredients. One variant of the model is trained to learn latent relational representations over a trainable memory network (Implicit Model). The other model can learn explainable relational representations over the pretrained memory network integrating an external knowledge base (Explicit Model). The relational representations are interpretable and can be queried as to the reasons why the ingredients have been suggested. The Explicit model can integrate any number of constraints which can be decided manually based on the characteristics of the desired recommender system. Our contributions are as follows: 1. We model ingredient pairing as a general recommendation task with implicit feedback. 2. We introduce the Interpretable Relational Representations Model and it's two variants: Implicit and Explicit. Both of which can learn pair specific relational representations (vectors) for one-to-one (i.e. ingredient to ingredient) and many-to-one (ingredient-set to ingredient) food pairing tasks. The relational vectors are also interpretable. 3. We propose a training procedure to learn one-to-one and many-to-one relationships effectively using recipes. 4. We evaluate our proposed models in the Recipe Completion Task and the Artificial Food pairing Task on the CulinaryDB and the Flavornet datasets. Our proposed approaches demonstrate competitive results on all datasets, outperforming many other baselines. 5. We perform qualitative analysis. The results presents our proposed Explicit model is capable of unraveling hidden ingredients structures within recipes.

2. RELATED WORK

There are two related streams of work in recommender systems that are important for this paper: the session-based setting and the knowledge-aware systems. In the session-based setting, user profile can be constructed from past user behavior. A natural solution to this problem is the item-to-item recommendation approach.A variety of methods exist for this problem. 2018) and Wang & Cai (2020) integrate them using a pre-trained knowledge graph. These methods try to represent user context using external knowledge base, therefore, usually these knowledge embeddings are integrated to user embeddings. In this work, we incorporate knowledge specifically to detect relationships between an ingredient set and an ingredient for interpretation to improve recommendation performance.

3. PROBLEM DEFINITION

We first introduce the notations used throughout this paper. We model recipe completion as a recommendation scenario with implicit feedback (Huang et al., 2018 , Tay et al., 2018. In such scenarios, a user has interacted with an item and the system infers the item that user will interact next based on the interaction records of the user. We apply this to the food domain by using recipes as interaction records. Let I denote a set of ingredients and {i 1 , . . . , i M } denote a pre-selected ingredient set, where i ∈ I is the ingredient and M is the number of ingredients. We call {i 1 , . . . , i M } pre-selected ingredient set in this paper. Next, let I candidate denotes a set of candidate ingredients. I candidate depends on each pre-selected ingredient set, that is, I candidate = I -{i 1 , . . . , i M }. In addition, we assume that a knowledge base (KB) of ingredients is also available and the KB contains factors which are related to why some ingredients are good combinations. A KB is defined as a set of triplets over a



For example, Quadrana et al. (2017) models the item sequence using RNNs, Kang & McAuley (2018) uses Self-Attention layers, and Wu et al. (2020) uses Transformer layers. While these methods mainly focus on how to encode item click-sequence interactions, we target good ingredient pairing using only ingredient attributes and the relationship between a ingredient set and an ingredient based on co-occurrence in recipes. For this we develop a new architecture integrating set encoders and relational memory with novel loss and score functions. There are also increasingly methods for integrating knowledge into recommenders. Zhang et al. (2016) and Cheng et al. (2016) directly incorporate user and item features as user profile into neural network models. Huang et al. (

