Overview of methodology

Reference

Advaith Siddharthan and Ann Copestake. Generating research websites using summarisation techniques. In Companion Volume to the Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), Columbus, OH.
paper(pdf)

Why generate webpages automatically?

In a research organisation such as ours, individual researchers take responsibility for maintaining their own webpages. In addition, researchers are organised into research groups, each of which maintains its own webpage. In this framework, information can get out of date. Most researchers keep their publications lists up-to-date. However, research summaries and the like are updated less often, on individual researcher homepages and on group homepages. Thus reading research profiles does not always give an accurate overview of what is currently being worked on.

Further, the tree structure (lab > research groups > researchers) make browsing difficult. As each researcher's webpage is maintained indivdually, links between researchers are often not obvious. A surfer then needs to repeatedly move up and down the tree hierarchy to browse the profiles of different researchers.

In these automatically generated pages, content is extracted from publication titles, and hence stays up-to-date. Information is formatted in a way that facilitates browsing. The left of the screen contains links to researchers of the same group and the middle contains a research profile. In addition, the right of the screen contains a list of recommendations: other researchers with similar research interests. This allows for browsing the website as a graph, without having to repeatedly move up and down the tree hierarchy.

How these webpages are generated

The program starts with a list of publications extracted from individual researchers' webpages; for example:

S. Teufel. 2007. An Overview of evaluation methods in TREC Ad-hoc Information Retrieval and TREC Question Answering. In Evaluation of Text and Speech Systems. L. Dybkjaer, H. Hemsen, W. Minker (Eds.) Springer, Dordrecht (The Netherlands).

From each publication entry such as that above, the program extracts author names, title and year of publication. This is the only information used. The titles are then parsed using the RASP parser and key-phrases are extracted by pattern matching. From the publication entry above, the extracted title produces five key-phrases:

``An overview of evaluation methods in TREC ad-hoc information retrieval and TREC question answering''

evaluation methods
evaluation methods in TREC ad-hoc information retrieval
TREC ad-hoc information retrieval
TREC question answering
information retrieval

Individual researcher pages

To create a webpage for an individual researcher, the key-phrases from all the paper titles authored by that researcher are clustered together - an example cluster is shown below:

automatic classification for information retrieval, intelligent automatic information retrieval, information retrieval test collections, information retrieval system, automatic classification, semantic classification, intelligent retrieval, information retrieval, information science, retrieval system, test collections, mail retrieval, trec ad-hoc information retrieval

A representative phrase is selected from each cluster ('information retrieval' from the cluster above) and this phrase is associated with all the publication dates of papers the terms in the cluster come from . These extracted key-phrases are eumerated as lists in five year intervals; for example:

2000--2004: 'human iris patterns'; 'iris recognition'; 'visual phase information'; 'complex-valued wavelets for stochastic pattern recognition'; 'biometric decision landscapes'; 'gabor wavelets'; 'brain theory'; 'epigenetic randomness'; 'statistical richness'; 'statistical principles'; 'brain metaphor'; 'statistical pattern recognition';

Recommendations (related people)

Recommendations for related people are generated by comparing the terms extracted between 2000 and 2008 for each researcher. The seven most similar researchers are shown in tabular form along with a list of terms from those researchers' profiles that are relevant to the researcher being viewed; eg:

Related People

Glenford Mapp [ DTG ]:
- 'autonomic systems'; 'data transport'; 'heterogeneous networking'; 'mobile networking'; 'mobility management'; 'mobility support'; 'networked surface'; 'synchronisation primitive'; 'virtual memory mapped communications';
Robert K Harle [ DTG ]:
- 'autonomous environment discovery'; 'discovering reflective surfaces'; 'low-power location sensor'; 'positioning systems'; 'robust fiducial tracking'; 'smart environments'; 'world models';
Frank Stajano [ Security ][ DTG ]:
- 'autonomic systems'; 'location privacy'; 'malicious mobile code'; 'mobility support'; 'sentient car';
Ian J Wassell [ DTG ]:
- 'fast start-up equaliser'; 'location privacy'; 'mobility management';
Alastair Beresford [ SRG ][ DTG ]:
- 'location privacy'; 'map generation'; 'mobile agents'; 'sentient computing';
David Ingram [ SRG ]:
- 'augmented reality'; 'sentient environment'; 'synchronisation primitive'; 'virtual memory mapped communications';
Jonathan J Davies [ DTG ]:
- 'map generation'; 'road network'; 'sentient transport';

Research Group Pages

Group pages are produced by summarising the pages of researchers in the group. Terms are clustered according to who is working on them. The group page is presented as a list of clusters. This presentation highlights how group members collaborate with each other, and for each term shows the relevant researchers, making navigation easier; for example:

cognitive dimensions'; 'domestic appliances'; 'motion captured dance'; 'solid diagrams'; 'accessible technology'; 'physical world'; 'visual programming'; 'end-user software engineering'; 'invented traditions'; 'biomechanically-based spinal model'; 'tangible user interfaces';
Alan Blackwell; Cecily Morrison; Darren Edge; Lorisa Dubuc; Luke Church;

'histogram warping'; 'non-uniform b-spline subdivision'; 'stylised rendering'; 'multiresolution image representation'; 'human behaviour'; 'multi-view autostereoscopic'; 'subdivision schemes'; 'minimising gaussian curvature variation near extraordinary vertices'; 'sampled cp surfaces'; 'bounded curvature variants';
Neil Dodgson; Thomas Cashman; Ursula Augsdorfer;

'text for multiprojector tiled displays'; 'tabletop interface'; 'high-resolution tabletop applications'; 'distributed tabletops'; 'remote review meetings'; 'rapid prototyping';
Peter Robinson; Philip Tuddenham;

'students with visual disabilities using high-resolution photography'; 'assistive tool'; 'lecture adaptation'; 'photonote evaluation'; 'time-lapse photography';
Gregory Hughes; Peter Robinson;

Disclaimer: All content has been generated automatically by a program running on the titles of publications extracted from people's webpages.
©2007 Advaith Siddharthan, Computer Laboratory, University of CambridgePlease send any comments on this page to as372@cl.cam.ac.uk