TRACKING THE PROGRESS OF LANGUAGE MODELS BY EXTRACTING THEIR UNDERLYING KNOWLEDGE GRAPHS Anonymous authors Paper under double-blind review

Abstract

The state of the art of language models, previously dominated by pre-trained word embeddings, is now being pushed forward by large pre-trained contextual representations. This success has driven growing interest to understand what these models encode inside their inner workings. Despite this, understanding their semantic skills has been elusive, often leading to unsuccessful, non-conclusive, or contradictory results among different works. In this work, we define a probing classifier that we use to extract the underlying knowledge graph of nine of the currently most influential language models, including word embeddings, context encoders, and text generators. This probe is based on concept relatedness, grounded on WordNet. Our results show that this knowledge is present in all the models, but has several inaccuracies. Furthermore, we show that the different pre-training strategies and architectures lead to different model biases. We conduct a systematic evaluation to discover specific factors that explain why some concepts are challenging for the different families of models. We hope our insights will motivate the future development of models that capture concepts more precisely.

1. INTRODUCTION

Natural language processing (NLP) encompasses a wide variety of applications such as summarization (Kovaleva et al., 2019) , information retrieval (Zhan et al., 2020) , and machine translation (Tang et al., 2018) , among others. Currently, the use of pre-trained language models has become the de facto starting point to tackle most of these applications. The usual pipeline consists of finetuning a pre-trained language model by using a discriminative learning objective to adapt the model to the requirements of each specific task. As key ingredients, these models are pre-trained using massive amounts of unlabeled data that can include millions of documents, and may include billions of parameters. Massive data and parameters are supplemented with a suitable learning architecture, resulting in a highly powerful but also complex model, whose internal operation is hard to analyze. The success of pre-trained language models has driven the interest to understand how they manage to solve NLP tasks. As an example, in the case of BERT (Devlin et al., 2019) , one of the most popular pre-trained models based on a Transformer architecture (Vaswani et al., 2017) , several studies have attempted to access the knowledge encoded in its layers and attention heads (Tenney et al., 2019b; Devlin et al., 2019; Hewitt & Manning, 2019) . In particular, (Jawahar et al., 2019) shows that BERT can solve tasks at a syntactic level by using Transformer blocks to encode a soft hierarchy of features at different levels of abstraction. Similarly, (Hewitt & Manning, 2019) shows that BERT is capable of encoding structural information from text. In particular, using a structural probe, they show that syntax trees are embedded in a linear transformation of the encodings provided by BERT. In general, previous efforts have provided strong evidence indicating that current pre-trained language models encode complex syntactic rules, however, relevant evidence about their abilities to capture semantic information remains still elusive. As an example, a recent study (Si et al., 2019) attempts to locate the encoding of semantic information as part of the top layers of Transformer architectures, however, results provide contradictory evidence. Similarly, (Kovaleva et al., 2019) focuses on studying knowledge encoded by self-attention weights, however, results provide evidence for over-parameterization but not about language understanding capabilities.

