skip to primary navigationskip to content

Department of Computer Science and Technology



Course pages 2023–24


Principal lecturer: Prof Pietro Lio'
Taken by: Part II CST
Term: Michaelmas
Hours: 12
Format: In-person lectures
Suggested hours of supervisions: 3
Exam: Paper 8 Question 2; Paper 9 Question 2
Past exam questions, Moodle, timetable


This course focuses on algorithms used in Bioinformatics and System Biology. Most of the algorithms are general and can be applied in other fields on multidimensional and noisy data. All the necessary biological terms and concepts useful for the course and the examination will be given in the lectures. The most important software implementing the described algorithms will be demonstrated.


  • Introduction to biological data: Bioinformatics as an interesting field in computer science. Computing and storing information with DNA (including Adleman’s experiment).
  • Dynamic programming. Longest common subsequence, DNA global and local alignment, linear space alignment, Nussinov algorithm for RNA, heuristics for multiple alignment. (Vol. 1, chapter 5)
  • Sequence database search. Blast. (see notes and textbooks)
  • Genome sequencing. De Bruijn graph. (Vol. 1, chapter 3)
  • Phylogeny. Distance based algorithms (UPGMA, Neighbour-Joining). Parsimony-based algorithms. Examples in Computer Science. (Vol. 2, chapter 7)
  • Clustering. Hard and soft K-means clustering, use of Expectation Maximization in clustering, Hierarchical clustering, Markov clustering algorithm. (Vol. 2, chapter 8)
  • Genomics Pattern Matching. Suffix Tree String Compression and the Burrows-Wheeler Transform. (Vol. 2, chapter 9)
  • Hidden Markov Models. The Viterbi algorithm, profile HMMs for sequence alignment, classifying proteins with profile HMMs, soft decoding problem, Baum-Welch learning. (Vol. 2, chapter 10)


At the end of this course students should

  • understand Bioinformatics terminology;
  • have mastered the most important algorithms in the field;
  • be able to work with bioinformaticians and biologists;
  • be able to find data and literature in repositories.

Recommended reading

* Compeau, P. and Pevzner, P.A. (2015). Bioinformatics algorithms: an active learning approach. Active Learning Publishers.
Durbin, R., Eddy, S., Krough, A. and Mitchison, G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press.
Jones, N.C. and Pevzner, P.A. (2004). An introduction to bioinformatics algorithms. MIT Press.
Felsenstein, J. (2003). Inferring phylogenies. Sinauer Associates.