Computer Laboratory

Technical reports

Sentiment analysis of scientific citations

Awais Athar

June 2014, 114 pages

This technical report is based on a dissertation submitted April 2014 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Girton College.

Some figures in this document are best viewed in colour. If you received a black-and-white copy, please consult the online version if necessary.

Abstract

While there has been growing interest in the field of sentiment analysis for different text genres in the past few years, relatively less emphasis has been placed on extraction of opinions from scientific literature, more specifically, citations. Citation sentiment detection is an attractive task as it can help researchers in identifying shortcomings and detecting problems in a particular approach, determining the quality of a paper for ranking in citation indexes by including negative citations in the weighting scheme, and recognising issues that have not been addressed as well as possible gaps in current research approaches.

Current approaches assume that the sentiment present in the citation sentence represents the true sentiment of the author towards the cited paper and do not take further informal mentions of the citations elsewhere in the article into account. There have also been no attempts to evaluate citation sentiment on a large corpus.

This dissertation focuses on the detection of sentiment towards the citations in a scientific article. The detection is performed using the textual information from the article. I address three sub-tasks and present new large corpora for each of the tasks.

Firstly, I explore different feature sets for detection of sentiment in explicit citations. For this task, I present a new annotated corpus of more than 8,700 citation sentences which have been labelled as positive, negative or objective towards the cited paper. Experimenting with different feature sets, I show the best result of micro-F score 0.760 is obtained using n-grams of length and dependency relations.

Secondly, I show that the assumption that sentiment is limited only to the explicit citation is incorrect. I present a citation context corpus where more than 200,000 sentences from 1,034 paper—reference pairs have been annotated for sentiment. These sentences contain 1,741 citations towards 20 cited papers. I show that including the citation context in the analysis increases the subjective sentiment by almost 185%. I propose new features which help in extracting the citation context and examine their effect on sentiment analysis.

Thirdly, I tackle the task of identifying significant citations. I propose features which help discriminate these from citations in passing, and show that they provide statistically significant improvements over a rule-based baseline.

Full text

PDF (4.4 MB)

BibTeX record

@TechReport{UCAM-CL-TR-856,
  author =	 {Athar, Awais},
  title = 	 {{Sentiment analysis of scientific citations}},
  year = 	 2014,
  month = 	 jun,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-856.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-856}
}