MASKFUSION: FEATURE AUGMENTATION FOR CLICK-THROUGH RATE PREDICTION VIA INPUT-ADAPTIVE MASK FUSION

Abstract

Click-through rate (CTR) prediction plays important role in the advertisement, recommendation, and retrieval applications. Given the feature set, how to fully utilize the information from the feature set is an active topic in deep CTR model designs. There are several existing deep CTR works focusing on feature interactions, feature attentions, and so on. They attempt to capture high-order feature interactions to enhance the generalization ability of deep CTR models. However, these works either suffer from poor high-order feature interaction modeling using DNN or ignore the balance between generalization and memorization during the recommendation. To mitigate these problems, we propose an adaptive feature fusion framework called MaskFusion, to additionally capture the explicit interactions between the input feature and the existing deep part structure of deep CTR models dynamically, besides the common feature interactions proposed in existing works. MaskFusion is an instance-aware feature augmentation method, which makes deep CTR models more personalized by assigning each feature with an instance-adaptive mask and fusing each feature with each hidden state vector in the deep part structure. MaskFusion can also be integrated into any existing deep CTR models flexibly. MaskFusion achieves state-of-the-art (SOTA) performance on all seven benchmarks deep CTR models with three public datasets.

1. INTRODUCTION

Click-through rate (CTR) prediction plays an important role in the field of personalized service. Factorization Machine (FM) Rendle (2010) based models are common solutions in recommendation systems. These methods transform the raw high-dimensional sparse features into low-dimensional dense real-value vectors by embedding techniques, and then enumerate all the possible feature interactions, thus avoiding sophisticated manual feature engineering. (2018) , and so on, treat DNN as a complementary tool to the feature interactions layer. Both these two types of deep CTR models utilize the feature interactions layer and the powerful representation ability of DNN to automatically learn high-order feature interactions in explicit and implicit ways, respectively. In fact, these complex structures are designed to address a major challenge in recommendation: generalization performance of the CTR models Cheng et al. (2016) . For example, the CTR models with strong generalization performance can better explore some feature combinations that have never appeared in historical information, thus recommending new items that users may be interested in. Currently, some works have attempted to further improve the generalization performance by addressing the problem that DNN is difficult to accurately capture higher-order feature interaction patterns Qu et al. ( 2018 (2021b) proposed an instance-guided mask, which is generated according to the global information of the instance, to dynamically enhance the informative elements of the hidden states vector by introducing multiplicative operation into DNN. Although these methods optimize the input or hidden states of the deep part in deep CTR models to a certain extent according to the global information of each instance, they did not pay attention to another major challenge in recommendation: the memorization ability of the CTR models Cheng et al. (2016) . For example, the model can better use the information available in historical data to make relevant recommendations that match user habits, rather than making less relevant recommendations due to over-generalization. To address the limitation of existing work, we proposed an input-adaptive feature augmentation framework, named MaskFusion, which can incrementally bring non-trivial performance improvements by incorporating various state-of-the-art deep CTR models flexibly and can be trained in an end-to-end manner. Different from the existing methods, by proposing explicit fusion operations, MaskFusion first enhances the memorization ability of the deep CTR models that previous SOTA CTR models have not paid much attention to enhancing. Second, uses the Mask Controller to make a better trade-off between generalization and memorization. Furthermore, incorporated with MaskFusion, CTR models can make predictions for the input instance by using the instance-wise masks to uniquely enhance each feature of this input instance so that the whole model becomes instance-level personalized during both training and inference. We summarize the contributions below: • Instance-aware Mask Controller was proposed to dynamically select the feature that needs to be memorized better for prediction task, according to the characteristics and behaviors of different input instances. Thus, can better balance memorization and generalization. • Comprehensive experiments were conducted on 7 benchmarks over 3 real-world datasets, the convincing results demonstrate the effectiveness and robustness of MaskFusion. Hyperparameters studies demonstrate that MaskFusion is a memory-friendly efficient framework since it achieves better performance with fewer parameters and memories.

2. RELATED WORK

DNN begins to benefit recommendation systems because of its powerful feature representation ability. Many works combine explicit feature interaction with DNN in deep CTR models. PNN Qu et al. (2016) introduces a product layer between embedding and DNN to explicitly learn feature interaction.



Figure 1: Deep learning based Click-Through Rate prediction architecture

Corresponding author is Jianchao Tan to the relative position of DNN and feature interaction layer, the existing deep CTR models can be divided into single tower model and dual tower model Zhang et al. (2021). Single tower models, such as Product-based Neural Networks (PNN) Qu et al. (2016), DLRM Naumov et al. (2019), and so on, can capture the high-order feature interactions to a certain degree, due to their architecture complexity. Dual tower models, such as Deep & Cross Network (DCN) Wang et al. (2017), xDeepfm Lian et al.

);Rendle et al. (2020). GemNN Fei et al. (2021)  introduces a gating mechanism between DNN and the embedding layer of deep CTR models to learn bit-wise feature importance. Features embedding will be fed into DNN after passing through a gating layer instead of being fed into DNN directly. In this way, the DNN can learn more effective feature interaction. MaskNet Wang et al.

We proposed an input-adaptive feature augmentation framework, named MaskFusion, which can capture the interaction between feature embedding and the deep part structure of deep CTR models adaptively and explicitly. MaskFusion framework is general enough to incorporate with other functionalities like Residual Feature Augmentation in DCNv2 Wang et al. (2021a) and Embedding Dimension Search Shen et al. (2020).

