Computer Laboratory

Course pages 2015–16

Biomedical Information Processing

Principal lecturers: Dr Anna Korhonen, Dr Pietro Lio'
Taken by: MPhil ACS, Part III
Code: R214
Hours: 16
Prerequisites: None


Research done within biomedical sciences is generating vast amounts of information which can, when processed appropriately, improve our understanding of the complex processes that govern life, death and disease. This course surveys computational techniques that can be used to process biomedical data with the overall goal of supporting the processes of scientific inquiry, problem solving, and decision making in biomedical sciences. A variety of data types and sources will be introduced, along with data and text mining techniques that can be used to analyse, extract, discover and integrate biomedical information at levels ranging from molecular through human populations. The course surveys specific problems in biology, clinical medicine and public health and shows how information processing can support practical applications in these areas.


  • Fundamental concepts in biomedicine: biology, clinical health, public health (1 lecture)
  • Biomedical databases and tools, data mining: biomedical databases, genome browsers, biomodels, and their interaction; specialised libraries (biojava, biopython, bioconductor) (2 lectures)
  • Biomedical literature and text mining: scientific and clinical texts, biomedical text processing and resources, techniques for information retrieval and information extraction (2 lectures)
  • Bio-molecular information: identifying and analysing information related to genes, drugs, diseases and nutraceuticals (2 lectures, 1 seminar, 1 practical)
  • Information related to clinical and translational medicine: comparing, sharing and using clinical and translational data among patients and health professionals (2 lectures, 1 seminar)
  • Public health information: gathering public health data from expert and non-expert resources (e.g. scientific publications vs. social media) and using these for epidemiology, health surveillance and social medicine (2 lectures, 1 practical)
  • Student presentations of literature surveys of topics in biomedical information processing. (1 session)

Note that some content may vary, and the number of lectures per topic is provisional; the final plan will depend on the students' background and the number of students taking the course.


On completion of this module, students should:

  • have a general understanding of the area of biomedical information processing and an in-depth understanding of selected techniques and topics;
  • understand the main advantages and limitations of the data types and techniques covered during the course;
  • be able to locate, read, understand, and present a research paper from the field;
  • be familiar with current research in a number of aspects of the field.


Practical work

Coursework will consist of different practical exercises. First, there will be two practicals in the course

  • Practical 1: Processing of bio-molecular information (using python to explore databases such as ncbi entrez, kegg, biomodels; practice with cytoscape, biomart)
  • Practical 2: Processing of public health information (cancer and neurodegenerative databases of patients and healthy controls)

Second, students will have a choice between a practical exercise (to be determined later) or a literature survey. For the literature survey, they will select one of a list of topics in biomedical information processing and carry out a survey of state-of-the-art research on this topic. The literature survey should be should be about 10 pages long and be based on approximately 8-10 papers. 

Finally, students will give a short presentation about a chosen biomedical information processing application to the rest of the class.


The module will be assessed as follows:

  • The two course practicals and the literature survey presentation will be assessed in terms of ticks, to count for 20% of the final mark.
  • A practical exercise (to be determined later) or the 10 page literature survey, marked by the lecturers using a percentage score, will account for 80% of the mark. This will be due two weeks after the end of the module (subject to timetabling).

Recommended reading

Sebastian Bassi. 2012. Python for Bioinformatics Chapman and Hall. CRC Mathematical and Computational Biology series. 

Hagit Shatkay and Mark Craven. 2012. Mining the Biomedical Literature. The MIT Press.