# Dr Thomas Brouwer

From 2014-2017 I was a PhD student under Pietro Lio' in machine learning and bioinformatics at the Computer Laboratory, University of Cambridge, where I also obtained my BA in Computer Science in 2014.

My research was focused on developing Bayesian matrix factorisation models for analysing and integrating biological datasets. In particular, I studied the impact of the inference method, the prior and likelihood choices, and how to best integrate multiple datasets. My applications were mainly focused on drug sensitivity prediction and gene expression datasets.

I have since left academia. Feel free to reach out to me with any questions though!

## News

- I passed my PhD viva! (30 November)
- Paper got accepted to ECML 2017, titled "Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation
- Paper got accepted to AISTATS 2017, titled "Bayesian Hybrid Matrix Factorisation for Data Integration"!
- Presented a poster at the NIPS Workshop on Advances in Approximate Bayesian Inference, titled "Fast Bayesian Nonnegative Matrix Factorisation and Tri-Factorisation" (9 December 2016).
- Visited Professor Samuel Kaski's group at Aalto University, Helsinki, and gave a talk "Hybrid matrix factorisation" (5-12 November 2016).
- Visited Professor Samuel Kaski's group at Aalto University, Helsinki, and gave a talk "Bayesian data integration by multiple matrix tri-factorisation" (11-18 May 2016).
- Presentated at the 7th Workshop on Complex Networks, CompleNet 2016, titled "FactorNet: network analysis for biclusters identification" (23 - 25 March 2016).
- Gave lecture at Cambridge University, as part of the Research Students Lectures series (17 November 2015).
- Passed first year viva of the PhD! (10 August 2015).

## Research

My research is focused on developing Bayesian probabilistic models for drug development. I am focusing on three application areas:

**Drug sensitivity prediction**- predicting how sensitive different (cancer) cell lines are to different drugs, given other sensitivity values, and features about the drugs (chemical structure, primary targets) and cell lines (gene expression profile, copy number variations, mutation data).**Drug synergy**- predicting whether two drugs work better together than by themselves, again using drug and cell line features.**Drug repositioning**- given information about which drugs work well on a number of diseases, try to infer which other drug-disease associations might work well. This is especially useful for rare diseases, without any good treatment options or extensive research to find them.

To achieve this, I focus on models that use probability distributions to describe the problems in terms of random variables, and use **Bayesian inference** to infer the distributions of these random variables after observing a dataset. Bayesian methods are more resistant to noise and overfitting than non-probabilistic approaches. I am currently specialising in **matrix factorisation** methods, which can be used to predict missing values in datasets, and extending them to incorporate other datasets such as features and similarity information between drugs and cell lines. I am also interested in investigating what the effects are of the inference methods and Bayesian prior choices on predictive performance.

## Presentations and notes

##### Papers

**Prior and Likelihood Choices for Bayesian Matrix Factorisation on Small Datasets**- arXiv.**Hybrid Bayesian Matrix Factorisation for Data Integration**- arXiv, paper, supplementary, AISTATS 2017.**Fast Bayesian Nonnegative Matrix Factorisation and Tri-Factorisation**- arXiv, paper, supplementary, NIPS 2016 Workshop on Advances in Approximate Bayesian Inference.

##### Conferences

**European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2017)**- paper, presentation, and poster titled "Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation" - 21 September 2017.**20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017)**- paper and poster titled*Bayesian Hybrid Matrix Factorisation for Data Integration*- 21 April 2017.**NIPS 2016 Workshop on Advances in Approximate Bayesian Inference**- poster titled*Fast Bayesian Nonnegative Matrix Factorisation and Tri-Factorisation*- 9 December 2016.**CompleNet 2016: 7th Workshop on Complex Networks**- 15-minute presentation titled*FactorNet: network analysis for biclusters identification*- 23 to 25 March 2016**Big Data in Medicine: Exemplars and Opportunities in Data Science**- poster titled*Identifying effective drugs for cancer*- 19 June 2015

##### Summer schools

**Microsoft PhD Summer School**- poster titled*Probabilistic models for improving drug development*- 29 June to 3 July 2015**Kyoto Machine Learning Summer School**- poster, spotlight presentation titled*Bayesian non-negative matrix tri-factorisation*- 23 August to 4 September 2015

##### Other talks

**Hybrid matrix factorisation**- 7 November 2016 - slides - 30-minute talk in Professor Samuel Kaski's group, Aalto University, Helsinki, Finland**Bayesian data integration by multiple matrix tri-factorisation**- 12 May 2016 - slides - 30-minute talk in Professor Samuel Kaski's group, Aalto University, Helsinki, Finland**Matrix factorisation and extensions**- 4 February 2016 - slides - 30-minute talk for Bioinformatics group**Introduction to Bayesian inference**- 17 November 2015 - slides - 1-hour lecture for Research Students Lectures series

##### Notes

**Probabilistic non-negative matrix factorisation and extensions**- literature review (unfinished!)

## Code (GitHub)

All my Python code is publicly available on my **GitHub account**.

**Prior and likelihood choices for Bayesian matrix factorisation on small datasets**- implementations of sixteen different Bayesian matrix factorisation models with different likelihood and prior choices.**Comparative study of inference methods for Bayesian nonnegative matrix factorisaton**- implementations for four inference approaches to matrix factorisation, as well as automatic relevance determination.**Bayesian hybrid matrix factorisation for data integration**- implementations for Bayesian hybrid matrix factorisation model; inference using Gibbs sampling.**Fast Bayesian non-negative matrix factorisation and tri-factorisation**- implementations for non-probabilistic inference, variational inference, Gibbs sampling, and iterated conditional modes.**K-means clustering with missing values**- for datasets with missing values.