Computer Laboratory

Diarmuid Ó Séaghdha

Exploring variation across biomedical subdomains


Tom Lippincott, Diarmuid Ó Séaghdha, Lin Sun and Anna Korhonen

Previous research has demonstrated the importance of handling differences between domains such as “newswire” and “biomedicine” when porting NLP systems from one domain to another. In this paper we identify the related issue of subdomain variation, i.e., differences between subsets of a domain that might be expected to behave homogeneously. Using a large corpus of research articles, we explore how subdomains of biomedicine vary across a variety of linguistic dimensions and discover that there is rich variation. We conclude that an awareness of such variation is necessary when deploying NLP systems for use in single or multiple subdomains.

  author = 	 {Tom Lippincott, Diarmuid {\'O S\'eaghdha}, Lin Sun and Anna Korhonen},
  title = 	 {Exploring variation across biomedical subdomains},
  booktitle = 	 {Proceedings of the 23rd International Conference on Computational Linguistics (COLING-10)},
  year =	 2010,
  address =	 {Beijing, China}