Bioinformatics
Lecture notes:
- Complete lecture slides (slides only) (large file: 100 Mb)
- Questions and answers examples (additional material in the supervisor section)
Videos (and implementation examples):
To facilitate the comprehension, the short videos, from 2021, will follow closely the slide content; lectures are partitioned in 15-20 minute videos.
Implementation of the algorithms explained in the lecture notes (non examinable): short programs, mostly in Python. ALL THE PAPERS/CODE BELOW ARE NON EXAMINABLE MATERIAL; THEY COULD BE USEFUL FOR FURTHER INSIGHTS INTO BIOINFORMATICS.
Lecture1 (Video): Introduction and Biological concepts (NON EXAMINABLE) (large file: 10 Mb)
- Data
Repositories (with link to human reference genome)
- Genome sequence of Sars-cov2 (NC_045512.2)_ non examinable
- Biopython (tutorial)- non examinable
- MinION brochure from Oxford Nanopore
Lecture1 (Video): Longest common subsequence (large file: 10 Mb)
Lecture2 (Video): Global and Local alignment (large file: 10 Mb)
- Implementation: Needleman-Wunsch alignment.py
- Implementation: Smith-Waterman alignment.py
- Implementation: Example pairwise alignment.py
- Implementation: Affine Gap.py (from Rosalind.info)
- Implementation: Pam250.py
- Note on affine gaps (P Pevzner)
- Note on affine gaps (R. Durbin)
Lecture2 (Video): Linear Time alignment (Hirschberg) (large file: 10 Mb)
Lecture2 : Four Russians speedup; RNA folding (large file: 10 Mb)
- Notes on Four Russians speedup from Gusfield
- Notes on Four Russians speedup from Michal Ziv-Ukelson
- Implementation: RNA_Nussinov.py
- Durbin et al: RNA prediction (Nussinov algorithm)
- Implementation: RNA_Four Russians cpp
- Implementation: RNA_Four Russians hpp
- Implementation: RNA_Four Russians main cpp
Lecture3 (Video): Building trees (large file: 10 Mb)
Lecture3 (Video): Additive Phylogeny (large file: 10 Mb)
Lecture3 (Video): UPGMA and Neighbor-Joining (large file: 10 Mb)
- Implementation: upgma.py
- Implementation: squared_error_distortion.py
- Implementation: neighbor_joining.py
- Implementation: limb_length.py
Lecture3 (Video): Small and Large Parsimony ; trees and multialignment(large file:10 Mb)
- Implementation: parsimony.py
- Implementation: clustal.py
- Blosum mutational matrix(similar to PAM)
- Blosum (similar to PAM) mutational matrix, description
- some explanations on phylogenetic algorithms
Lecture4 (Video): Genome Sequencing (large file: 10 Mb)
(Video): DeBruijn Graph (large file: 10 Mb)
Lecture4 (Video): DeBruijn Pairs (large file: 10 Mb)
- Lecture5 (Video): Lloyd algorithm for clustering (large file: 10 Mb)
- Implementation: Lloyd.py
- Implementation: farthest_first_travers.py
- Implementation: hierarchical_clustering.py
Lecture5 (Video): Expectation-Maximisation (large file: 10 Mb)
Lecture5 (Video): Markov Clustering Algorithm (large file: 10 Mb)
- Implementation: mcl_clustering.py
- Leiden and Louvain algorithms
- Leiden and Lovain in single cell analysis
Lecture6 (Video): Genome Assemble (large file: 10 Mb)
Lecture6 (Video): Burrows-Wheeler Transform (large file: 10 Mb)
- Note on Burrows-Wheeler Transform, FM index, suffix arrays
- Implementation: Burrows-Wheeler Transform.py
- Implementation: Burrows-Wheeler Transform with example.py
- Implementation: trie.py
- Implementation: suffix tree.py
Lecture7 (Video): Hidden Markov Models: Viterbi (large file: 10 Mb)
Lecture7 (Video): Hidden Markov Models: Forward and Backward (large file: 10 Mb)
- Implementation: Forward and Backward.py
- Implementation: TMHMM.py Transmembrane protein segments prediction
- Notes on protein classification using HMM (Non examinable)
Lecture8 (Video): How to compute with DNA (large file: 10 Mb)
Lecture9 (Video): How to use DNA as memory storage (large file: 10 Mb)
Lecture10 (Video): How to simulate genetic and protein reaction networks: Doob-Gillespie (large file: 17 Mb)
Lecture 11 Revision (Video) Textbook reference, Revision note 1, Revision note 2, Revision note 3 (large files: 10 Mb)
Lecture12 (Video): Example Class (I am happy to organise online example class with students) (large file: 10 Mb)
- Link to Bioinformatics Example class slides only
- Guideline exam answers (main take home: questions are enough general so there is always something to write)
- Past exam questions
Notes
- Note on Privacy & Ethics using bioinformatics for sensitive data analysis
- Ten rules to read a scientific paper
- What to stress in a Bioinformatics job interview
Additional References: original papers (PDF) (non necessary for the exams!)
- Algorithms for the Longest Common Subsequence Problem by Hirschberg
- Four Russians implementation on string edit problem by Masek and Paterson
- Computational prediction of eukaryotic protein-coding genes by M. Zhang
- Algorithms_for_Loop_Matching by R. Nussinov et al
- A Block-sorting Lossless Data Compression Algorithm by Michael Burrows and David Wheeler
- Opportunistic data structures with applications by Paolo Ferragina and Giovanni Manzini
- Suffix Arrays: A New Method for On-Line String Searches by Manber and Myers
- Adleman original paper
- DNA storage: random access
- DNA storage: review paper
- Daniel Gillespie paper (1976)
- Daniel Gillespie paper (1977)