CONSTRAINING LATENT SPACE TO IMPROVE DEEP SELF-SUPERVISED E-COMMERCE PRODUCTS EMBED-DINGS FOR DOWNSTREAM TASKS

Abstract

The representation of products in a e-commerce marketplace is a key aspect to be exploited when trying to improve the user experience on the site. A well known example of the importance of a good product representation are tasks such as product search or product recommendation. There is however a multitude of lesser known tasks relevant to the business, examples are the detection of counterfeit items, the estimation of package sizes or the categorization of products, among others. It is in this setting that good vector representations of products that can be reused on different tasks are very valuable. Past years have seen a major increase in research in the area of latent representations for products in e-Commerce. Examples of this are models like Prod2Vec or Meta-Prod2Vec which leverage from the information of a user session in order to generate vectors of the products that can be used in product recommendations. This work proposes a novel deep encoder model for learning product embeddings to be applied in several downstream tasks. The model uses pairs of products that appear together in a browsing session of the users and adds a proximity constraint to the final latent space in order to project the embeddings of similar products close to each other. This has a regularization effect which gives better features representations to use across multiple downstream tasks, we explore such effect in our experimentation by assessing its impact on the performance of the tasks. Our experiments show effectiveness in transfer learning scenarios comparable to several industrial baselines.

1. INTRODUCTION

The e-Commerce environment has been growing at a fast rate in recent years. As such, new tasks propose new challenges to be resolved. Some key tasks like product search and recommendation usually have large amounts of data available and dedicated teams to work on them. On the other hand, some lesser known but still valuable tasks have less quality annotated data available and the main goal is to resolve them with a small investment. Examples of the latter are counterfeit/forbidden product detection, package size estimation, etc. For these scenarios, the use of complex systems is discouraged in favor of industry proven baselines like bag-of-words or fastText (Joulin et al., 2016) . In particular, with the advent of "Feature Stores" (Li et al., 2017) , industrial applications are seeing a rise in the adoption of organization-wide representations of business entities (customers, products, etc.). These are needed in order to speed up the process of building machine learning pipelines to enable both batch training and real-time predictions with as low effort as possible. In the present work we explore the representation learning of marketplace products to apply in downstream tasks. More specifically we aim to train an encoder that that can transform products into embeddings to be used as features of a linear classifier for a specific task, thus avoiding feature engineering for the task. The encoder model training is done is a self-supervised fashion by leveraging browsing session data of users in our marketplace. Using product metadata and an architecture inspired on the recent work of Grill et al. (2020) , we explore how the use of pairs of products in a session can enable transfer learning into several downstream tasks. As we discuss further in Section 3, we extend on the work of Grill et al. (2020) with a new objective function that combines their

