A central challenge in applying artificial intelligence to the life sciences is that real-world biological data are heterogeneous, sensitive, and collected under highly variable conditions. Our group develops foundational machine learning methods for trustworthy AI that address these challenges in complex multimodal and multi-centre datasets, including biomedical imaging, omics data, and plant phenotyping.
Our research is organised around three interconnected themes.
Explainable AI. We design methods that make the reasoning of machine learning models transparent and understandable to machine learning researchers and life-science domain experts. By uncovering what drives model predictions – whether in a clinical scan, a genomic profile, or a plant phenotype – we aim to build systems that scientists can interrogate, interpret, and use to generate new insights.
Privacy-preserving machine learning. Biological and clinical datasets are often sensitive and distributed across institutions. We develop approaches such as federated learning, synthetic data generation, and privacy-aware analysis that enable models to be trained collaboratively across multiple datasets while protecting the identity of individuals and the confidentiality of data sources.
Out-of-distribution generalisation. Models trained in one setting often fail when applied to new populations, institutions, or experimental conditions. We investigate methods that improve robustness to such distributional shifts, allowing models to remain reliable when deployed in new hospitals, cohorts, species, or growth environments.
By addressing interpretability, privacy, and robustness simultaneously, our work aims to enable reliable and trustworthy AI systems that researchers, clinicians, and biologists can use and build upon across the life sciences.