Computer Laboratory

Diarmuid Ó Séaghdha

Unsupervised learning of rhetorical structure with un-topic models


Diarmuid Ó Séaghdha and Simone Teufel

In this paper we investigate whether unsupervised models can be used to induce conventional aspects of rhetorical language in scientific writing. We rely on the intuition that the rhetorical language used in a document is general in nature and independent of the document’s topic. We describe a Bayesian latent-variable model that implements this intuition. In two empirical evaluations based on the task of argumentative zoning (AZ), we demonstrate that our generality hypothesis is crucial for distinguishing between rhetorical and topical language and that features provided by our unsupervised model trained on a large corpus can improve the performance of a supervised AZ classifier.

  author = 	 {Diarmuid {\'O S\'eaghdha} and Simone Teufel},
  title = 	 {Unsupervised learning of rhetorical structure with un-topic models},
  booktitle = 	 {Proceedings of the 25th International Conference on Computational Linguistics (COLING-14)},
  year =	 2014,
  address =	 {Dublin, Ireland}