MESSAGENET: MESSAGE CLASSIFICATION USING NATURAL LANGUAGE PROCESSING AND META-DATA

Abstract

In this paper we propose a new Deep Learning (DL) approach for message classification. Our method is based on the state-of-the-art Natural Language Processing (NLP) building blocks, combined with a novel technique for infusing the metadata input that is typically available in messages such as the sender information, timestamps, attached image, audio, affiliations, and more. As we demonstrate throughout the paper, going beyond the mere text by leveraging all available channels in the message, could yield an improved representation and higher classification accuracy. To achieve message representation, each type of input is processed in a dedicated block in the neural network architecture that is suitable for the data type. Such an implementation enables training all blocks together simultaneously, and forming cross channels features in the network. We show in the Experiments Section that in some cases, message's meta-data holds an additional information that cannot be extracted just from the text, and when using this information we achieve better performance. Furthermore, we demonstrate that our multi-modality block approach outperforms other approaches for injecting the meta data to the the text classifier.



. The main distinction between text and message classification is the availability of additional attributes, such as the sender information, timestamps, attached image, audio, affiliations, and more. New message classification contests often appear in the prominent platforms (i.e., Kaggle Kaggle), showing how this topic is sought after. There are already many data-sets to explore in this field, but no clear winner algorithm that fits all scenarios with high accuracy, efficiency and simplicity (in terms of implementation and interpretation). 2019), make such models accessible and easy to use as well as provide pre-trained versions. In addition, one can use transfer learning Pan & Yang (2009) to further train BERT on their on data, creating a tailored model for the specific task at hand. BERT, and often other transformer based models, are designed to handle text. They operate on the words of a given text by encoding them into tokens, and by the connections between the tokens they learn the context of sentences. This approach is limited, since sometimes more information can be extracted and used, not necessarily textual. Throughout this paper we refer to this information as meta-data to distinguish it from the main stream of textual content (though one may recognize it as the core data, depending on the application). For example, a meta-data could be the time stamp



applications require message classification and regression, such as handling spam emails Karim et al. (2020), ticket routing Han et al. (2020), article sentiment review Medhat et al. (2014) and more. Accurate message classification could improve critical scenarios such as in call centers (routing tickets based on topic) Han et al. (2020), alert systems (flagging highly important alert messages) Gupta et al. (2012), and categorizing incoming messages (automatically unclutter emails) Karim et al. (2020);

notable advancement in the field of NLP is the attention based transformers architecture Vaswani et al. (2017). This family of methods excels in finding local connections between words, and better understanding the meaning of a sentence. A leading example is the Bidirectional Encoder Representations from Transformers (BERT) Devlin et al. (2018) as well as its variations Liu et al. (2019); Lan et al. (2019); Sanh et al. (2019), winning certain benchmarks Rajpurkar et al. (2018); Wang et al. (2019). Several packages, such as Huggingface Transformers Wolf et al. (

