Department of Computer Science and Technology

Course pages 2017–18

Biomedical Information Processing

Principal lecturers: Dr Anna Korhonen, Dr Pietro Lio, Dr Nigel Collier
Taken by: MPhil ACS, Part III
Code: R214
Hours: 16
Class limit: 16 students
Prerequisites: Good programming skills (at least two programming languages e.g. python, R, java)


Research done within biomedical sciences is generating vast amounts of information which can, when analysed appropriately, improve our understanding of the complex processes that govern life, death and disease. This course surveys computational techniques that can be used to process biomedical data with the overall goal of supporting the processes of scientific inquiry, problem solving, and decision making in biomedical sciences. A variety of data types will be introduced, along with data and text mining techniques that can be used to analyse, extract, discover and integrate biomedical information at levels ranging from molecular through human populations. The course surveys specific problems in biology, clinical medicine and public health and shows how information processing can support practical applications in these areas.


  • Basic concepts in biomedicine
  • Biomedical databases, tools and data mining
  • Biomedical text data resources, tools and text mining
  • Bio-molecular information processing and applications
  • Clinical and translational information processing and applications
  • Public health information processing and applications
  • Student presentations of practical work or literature surveys in biomedical information processing


On completion of this module, students should:

  • understand key topics in biomedical information processing;
  • be able to locate data and tools for processing biomedical information;
  • understand relevant data standards, metadata and data interoperability;
  • have practice in quantitative data analysis (e.g. sample selection, data summarisation);
  • understand new methods for analysing ‘big’ health data (e.g. in silico hypothesis generation using data and text mining);
  • be able to showcase interesting cross-disciplinary case studies;
  • have experience on solving practical problems using data and text mining techniques.

Coursework and Practical work

Coursework will consist of different practical exercises. First, there will be two exercises on the topics covered during lectures. Second, students will have a choice between (i) a course practical that will involve biomedical information processing using data and text mining (the details of the practical will be determined later) and (ii) literature survey.

For the literature survey, students will select one of a list of topics in biomedical information processing and will carry out a survey of state-of-the-art research on this topic. The literature survey should be should be about 10 pages long and be based on approximately 8-10 papers. Finally, students will give a short presentation about their work to the rest of the class.

Lecture notes can be found on the Moodle page (Only available to Cambridge University staff and students)


The two exercises and the presentation will be assessed in terms of ticks, to count for 20% of the final mark.

The course practical (to be determined later) or the 10 page literature survey, marked by the lecturers using a percentage score, will account for 80% of the mark.

Online submission is available from the Moodle page (Only available to Cambridge University staff and students)

Recommended reading

Ira J. Kalet. 2013. Principles of Biomedical Informatics. Academic Press.
David J. Lubliner. 2015 Biomedical Informatics: An Introduction to Information Systems and Software in Medicine and Health. . CRC Press.
Hagit Shatkay and Mark Craven. 2012. Mining the Biomedical Literature. . The MIT Press.