Headline Generation

  • Proposer: Ekaterina Kochmar
  • Supervisors: Ekaterina Kochmar
  • Special Resources: Gigaword
  • Applications: abstractive summarisation, reading comprehension testing

    Description

    Headline generation is a task concerned with summarising the content of a text in a single sentence. One can think of news articles headlines or generating titles for texts in this context. This task is an example of abstractive summarisation. Recent advances in neural networks and deep learning allowed to improve the performance of the abstractive summarisation systems (Rush et al., 2015).

    The starting point for this project would be reimplementation of the system described in Rush et al. (2015). The paper presents a number of competitive but computationally simpler baseline models. The most successful of those should be considered in this project. For the ideas on the extensions and improvements for this system see the concluding section of the paper and the additional background reading. For instance, consider the following directions:

    1. The original paper uses feedforward neural network. Try different approaches, e.g. recurrent neural network or bidirectional RNN (see Lopyrev (2015) for further ideas);
    2. The original paper essentially performs sentence-level abstractive summarisation: it considers the first line of the news article to generate the article title. This works reasonably well for news articles headlines because the core information is usually presented in the first several sentences. The same approach will likely fail on a random text where the core information is spread across the text. Consider extending the model to paragraph and text level (see Tan et al. (2017) for further ideas);
    3. One of the weaknesses of the original approach is that it is agnostic to the linguistic structure (see the errors in the output related to identifying the correct subject). Consider adding syntactic and semantic information (see Takase et al. (2016) for further ideas).

    Abstractive summarisation is applicable in a number of contexts, including, for instance, testing reading comprehension. In addition to generating headlines for the news articles in Gigaword, this project will consider generating one-sentence summaries and titles for the educational reading material (at the text and paragraph levels) which can further be used to test reading comprehension in humans and will be a good test for the headline generation system.


    Additional resources:

    A corpus of reading material with the text- and paragraph-level titles will be provided for testing.

    Background reading:

    Rush et al. (2015), A Neural Attention Model for Abstractive Sentence Summarization

    Lopyrev (2015), Generating News Headlines with Recurrent Neural Networks

    Takase et al. (2016), Neural Headline Generation on Abstract Meaning Representation

    Tan et al. (2017), From Neural Sentence Summarization to Headline Generation: A Coarse-to-Fine Approach