Department of Computer Science and Technology

Technical reports

Data-driven representations in brain science: modelling approaches in gene expression and neuroimaging domains

Tiago M. L. Azevedo

July 2022, 136 pages

This technical report is based on a dissertation submitted 13 February 2022 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Churchill College.

Some figures in this document are best viewed in colour. If you received a black-and-white copy, please consult the online version if necessary.

DOI: 10.48456/tr-973


The assumptions made before modelling real-world data greatly affect performance tasks in machine learning. It is then paramount to find a good data representation in order to successfully develop machine learning models. When no considerable prior assumption exists on the data, values are directly represented in a “flatten”, 1-Dimensional vector space. However, it is possible to go one step further and perceive more complex relational patterns: for example, a Graph-Dimensional space is used to illustrate the more structured way to represent data and their relational inductive bias.

This thesis is focused on these two computational data dimensions across two scales of human biology: the micro scale of molecular biology using gene expression data, and the macro scale of neuroscience using neuroimaging data. Different modelling approaches will be explored to understand how one can model and represent high-dimensional brain data across the specific needs in the applied fields of these two scales. Specifically, for Graph-Dimensional data two approaches will be developed. Firstly, specific and shared genetic profiles that can be generalisable to external datasets will be extracted by applying multilayer co-expression networks across 49 human tissues. Then, a novel deep learning model will be introduced to leverage the entirety of resting-state fMRI data (i.e., spatial and temporal dynamics), as opposed to previous approaches in the literature that simplify and condense this type of data, while illustrating its robustness in an external multimodal dataset and explainability capacities. For 1-Dimensional data, an interpretable model will be developed for understanding cognitive factors using multimodal brain data.

Overall, the research adopted in this thesis explores explainable data-driven representations and modelling approaches across the multidisciplinary scientific fields of machine learning, molecular biology, and neuroscience. It also helps highlight the contributions of these fields when modelling the brain and its intra- and inter-dynamics across the human body.

Full text

PDF (5.9 MB)

BibTeX record

  author =	 {Azevedo, Tiago M. L.},
  title = 	 {{Data-driven representations in brain science: modelling
         	   approaches in gene expression and neuroimaging domains}},
  year = 	 2022,
  month = 	 jul,
  url = 	 {},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-973},
  number = 	 {UCAM-CL-TR-973}