Computer Laboratory

Technical reports

The language of collaborative tagging

Theodosia Togia

September 2015, 203 pages

This technical report is based on a dissertation submitted September 2015 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Trinity Hall.

Some figures in this document are best viewed in colour. If you received a black-and-white copy, please consult the online version if necessary.


Collaborative tagging is the process whereby people attach keywords, known as tags, to digital resources, such as text and images, in order to render them retrievable in the future. This thesis investigates how tags submitted by users in collaborative tagging systems function as descriptors of a resource’s perceived content. Using computational and theoretical tools, I compare collaborative tagging with natural language description in order to determine whether or to what extent the former behaves as the latter.

I start the investigation by collecting a corpus of tagged images and exploring the relationship between a resource and a tag using theories from different disciplines, such as Library Science, Semiotics and Information Retrieval. Then, I study the lexical characteristics of individual tags, suggesting that tags behave as natural language words. The next step is to examine how tags combine when annotating a resource. It will be shown that similar combinatory constraints hold for tags assigned to a resource and for words as used in coherent text. This realisation will lead to the question of whether the similar combinatory patterns between tags and words are due to implicit semantic relations between the tags. To provide an answer, I conduct an experiment asking humans to submit both tags and textual descriptions for a set of images, constructing a parallel a corpus of more than one thousand tags-text annotations. Analysis of this parallel corpus provides evidence that a large number of tag pairs are connected via implicit semantic relations, whose nature is described. Finally, I investigate whether it is possible to automatically identify the semantically related tag pairs and make explicit their relationship, even in the absence of supporting image-specific text. I construct and evaluate a proof-of-concept system to demonstrate that this task is attainable.

Full text

PDF (28.2 MB)

BibTeX record

  author =	 {Togia, Theodosia},
  title = 	 {{The language of collaborative tagging}},
  year = 	 2015,
  month = 	 sep,
  url = 	 {},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-875}