The immense growth of biological information stored in databases
has led to a critical need for people who can understand the
languages and techniques of Computer Science and Biology. The
first two lectures will focus on some aspects of Molecular Biology
that are relevant to the course; other biological insights will be
given as examples and applications in other lectures. This course
will discuss algorithms for some important computational problems
in Molecular Biology that are pertinent to Biotechnology and to
the so-called ``Post-genome era''. We shall focus on Hidden Markov
models and Phylogenetic inference algorithms for Comparative
Genomics. Students will learn how to describe biological processes
using Pi calculus and will be encouraged to think in two ways: How can
Computer Science be useful to Biology? How can Biology be
useful to Computer Science? In order to seed and to stimulate this
approach, part of the course will be based on biological networks
that offer many opportunities for comparing biological and
electronics systems.
Lectures
Introduction to genomics. Basic concepts of molecular
biology and genomics. the Human genome.
Introduction to biological networks. Molecular biology
of communications within and between cells. Universality of
networks topologies. Cancer.
Sequence alignment. Dynamic programming. Global versus
local. Scoring matrices. The Blast family of programs.
Significance of alignments.
Hidden Markov Models in Bioinformatics. Definition and
applications in Bioinformatics. Examples of the Viterbi, the
Forward and the Backward algorithms. Parameter estimation for
HMMs. [2 lectures]
Trees. The Phylogeny problem. Distance methods,
parsimony, bootstrap. Stationary Markov processes. Rate matrices.
Maximum likelihood. Felsenstein's post-order traversal. [2 lectures]
Multiple sequence alignment. Aligning more than two
sequences. Genomes alignment. Structure-based alignment.
Finding regulatory elements. Finding regulatory
elements in aligned and unaligned sequences. Gibbs sampling.
Introduction to microarray data analysis. Steady state
and time series microarray data. From microarray data to
biological networks. Identifying regulatory elements using
microarray data. [2 lectures]
Pi calculus. Description of biological
networks; stochastic Pi calculus, Gillespie algorithm.
Examples classes
Databases and genome browsers. Databases for sequences
and gene expression, genome browsers. Bioinformatics tools for
Java.
Phylogenetic inference. Phylogeny and HMM. The HIV
virus phylogeny.
Biological networks. Analysis of cell cycle or cancer
related microarray data and network models.
Objectives
At the end of this course students should
understand the terminology of Bioinformatics and be able to
use it with precision
understand and be able to use databases, and apply advanced
data analysis techniques in appropriate situations
be able to describe biological processes using Pi calculus
Recommended reading
* Durbin, R., Eddy, S., Krough, A. & Mitchison, G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press.
Felsenstein, J. (2003). Inferring phylogenies. Sinauer Associates.
Fall, C., Marland, E., Wagner, J. & Tyson, J. (2002). Computational cell biology. Springer-Verlag Telos (1st ed.).
Bower, J.M. & Bolouri, H. (2001). Computational modeling of genetic and biochemical networks. MIT Press.