WEBBRAIN: LEARNING TO GENERATE FACTUALLY CORRECT ARTICLES FOR QUERIES BY GROUNDING ON LARGE WEB CORPUS

Abstract

In this paper, we introduce a new NLP task -generating short factual articles for queries by mining supporting evidence from the Web. In this task, called WEB-BRAIN, the ultimate goal is to generate a fluent, informative, and factually-correct short article (e.g., a Wikipedia article) for a factual query unseen in Wikipedia. To enable experiments on WEBBRAIN, we construct a large-scale dataset WebBrain-Raw by extracting English Wikipedia articles and their crawlable Wikipedia references. WebBrain-Raw is ten times larger than the previous biggest peer dataset, which can greatly benefit the research community. Besides, we empirically analyze the performances of the current state-of-the-art NLP techniques on WEB-BRAIN and introduce a new framework ReGen, which enhances the generation factualness by improved evidence retrieval and task-specific pre-training for generation. Experiment results show that ReGen outperforms all baselines in both automatic and human evaluations.

1. INTRODUCTION

Information acquisition is one of the fundamental daily needs of human beings. Acquiring information from the Web is undoubtedly a convenient and efficient way. However, with the exponential growth of the Web, information on the Web becomes scattered and evolves quickly, making it challenging for users to acquire the expected information quickly. As a result, Wikipedia articles become the best bet for most users when searching answers for factual queries on the Web (Singer et al., 2017) . The reason is that Wikipedia articles provide credible content in which most claims can be supported by references from reputable sources. While Wikipedia is a good source of answers for factual queries, the need for manual editing (crowd-sourcing and editor checking) curbs its growth of coverage on a broader range of information needs. What if Wikipedia articles could be automatically generated? In this paper, we introduce a new task, WEBBRAIN, exploring the capacity of generating short factual articles for queries via a large web corpus. Given a factual query, the goal of the task is to enable a system to mine supporting evidence from the Web and generate a short factual article in which the claims are supported by the mined evidence (defined in Section 3.1). One of the potential generation targets for WEBBRAIN is the first section of a new Wiki page, based on which we can further explore generating long factual articles (e.g., a complete Wiki page). WEBBRAIN can be greatly helpful in various scenarios, including generating Wiki pages for new entities, intelligent writing assistance, knowledge-intensive QA, etc. WEBBRAIN's goal is considered one of the ultimate goals of the future search engine (Metzler et al., 2021) . Figure 1 illustrates a case of our WEBBRAIN. 1 To establish the data foundation of WEBBRAIN, we construct a large-scale dataset, WebBrain-Raw, from scratch by extracting all English Wikipedia articles and all the corresponding reference articles. To the best of our knowledge, WEBBRAIN-Raw is the biggest dataset sourced from Wikipedia (about 10× larger than the previous biggest peer WikiSum (Liu et al., 2018) , introduced in Section 3.2). Along with WEBBRAIN, we empirically investigate the ability of the current state-of-theart techniques and conclude that most current models lack the ability to correctly cite references and GPT-3 is a machine learning platform that enables developers to train and deploy AI models. It is also said to be scalable and efficient with the ability to handle large amounts of data. Introduce GPT-3 Input query

Web Articles Retrieval Module

[1] Shead, Sam (July 23, 2020). "Why everyone is talking about the A.I. text generator released by an Elon Musk-backed lab". CNBC. [2] Bussler, Frederik (July 21, 2020). "Will GPT-3 Kill Coding?". Towards Data Science.

Generative Module ReGen

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model, which is created by OpenAI, an artificial intelligence research laboratory. [1] GPT-3 is part of a trend in natural language processing (NLP) systems of pre-trained language representations. [2] (1) Text Generation (2) WebBrain WEBBRAIN distinguishes itself from the existing tasks in the following aspects: (1) Existing generative models generate text solely depending on the implicit knowledge stored in the model's parameters, which are prone to generating factually incorrect statements, a phenomenon commonly called hallucination. In contrast, our WEBBRAIN aims to generate factual statements depending on the supporting evidence mined from the Web; (2) Retrieval-augmented QA utilizes the retriever to enhance answering specific questions whose answers are usually a text span or a single sentence. Given a factual query, our WEBBRAIN explores capturing all the knowledge available on the Web to generate a comprehensive and accurate short article; (3) WEBBRAIN has a more complex pipeline than the MDS task, which focuses on summarizing documents that are already prepared. WEB-BRAIN requires models to mine useful evidence before generating the factual article; (4) WebGPT mimics the behavior of a human by browsing the Web to answer a specific question. This involves many challenges that are difficult to solve (e.g., collecting training data by recording human behaviors). Instead, WEBBRAIN grounds itself on a large number of Wikipedia articles that are already created and edited by crowds, which makes the task much more realistic. In summary, our contributions are threefold: (1) we introduce a new task, WEBBRAIN, dedicated to answering factual queries by generating short factual articles based on evidence retrieved from a large web corpus; (2) we construct a large-scale dataset, WebBrain-Raw, to evaluate the potential of WEBBRAIN. WebBrain-Raw is ten times larger than the previous biggest peer dataset; and (3) we empirically analyze the performance of current state-of-the-art techniques on WEBBRAIN, and propose our factuality-enhanced framework, ReGen, which outperforms all baselines through both automatic and human evaluation.



the text generation result is obtained via OpanAI's GPT3 API: https://beta.openai.com/



Figure 1: Comparison between text generation task and WEBBRAIN.

