Machine learning for Genomics

Jeroen de Ridder

Research Group

Jeroen de Ridder
Jeroen de Ridder
We are developing computational strategies, often inspired by machine learning, to analyze and integrate genomics data.
Group name: De Ridder Lab
Research field: Machine learning for Genomics
Algorithms, Genomics, Integrative Omics, Machine Learning

Contact

Universiteitsweg 100
3584 CG
Utrecht
Department / Institute: Center for Molecular Medicine / University Medical Center Utrecht
Office: STR1.305
Building: Stratenum
j.deridder-4@umcutrecht.nl
0887568312
http://deridderlab.nl

Our Research

Genome conformation
The genome is not a straight line. We are developing computational strategies to exploit measurements of the genome conformation in the analysis of genomics data. To this end, we build graph-based data integration strategies and exploit large-scale epigenomics datasets. Recently, we have shown that cancer-causing mutations in the mouse genome are co-localized in 3D hotspots and linked to known cancer genes through long-range chromatin interactions [1]. Together with the de Laat lab (Hubrecht Institute), we are working on designing the computational methods to detect multi-way interactions in the 3D genome.

Non-coding mutations
We work on analytical and computational frameworks that lead to fast, cost-efficient and comprehensive detection and annotation of structural variations in cancer genomes. We particularly focus on previously neglected variations occurring in unexplored regions of the cancer genome – the non-coding genome. With these methods we aim to provide an important component in future genome-first-based clinical decision making for cancer patients and drive discovery of novel cancer genes and mechanism from modern day whole genome sequencing data.

Interpretable machine learning
In this research line we aim to unravel biological mechanisms by investigating how and why trained prediction models fit the data. For instance, we create methods to identify robust genesets or pathways that differentiate between breast cancer subtypes or cancer treatment. To this end, we employ machine learning models that can exploit existing biological knowledge, such as network- and pathway-based classifiers [3].

Data integration methods
To answer modern biological questions often a systems approach is required, wherein multiple genome-wide measurements interrogating multiple biological phenomena need to be integrated. To enable this, we investigate data integration methodologies, in particular those that exploit graphs and graph-mining. For instance, we developed so called scale-aware graph-topological measures [4] that enable rich descriptions of network architecture and used this to describe DNA-DNA contact maps in the brain [2].

Key publications
[1] Babaei S., et al. , de Ridder J. 3D hotspots of recurrent retroviral insertions reveal long-range interactions with cancer genes. Nature Comm. 2015
[2] Babaei, S., et al. , de Ridder J.*, Reinders M.* Multi-scale chromatin interactions are predictive for spatial co-expression patterns in the mouse cortex. PLoS Comp. Biol. 2015
[3] Allahyar A., de Ridder J. FERAL: network-based classifier with application to breast cancer outcome prediction. Bioinformatics. 2015
[4] Hulsman M., Dimitrakopoulos C., de Ridder J. Scale-space measures for graph topology link protein network architecture to function, Bioinformatics, 2014
[5] Akhtar W, et al. de Ridder J, …, van Lohuizen M, van Steensel B. Chromatin Position Effects Assayed by Thousands of Reporters Integrated in Parallel. Cell. 2013