24 Sep 2019 - 13:30 to 17:30
Minnaert building (room 4.16), Leuvenlaan 4, Utrecht
In the context of Driven by Data, following-up on our successful topical seminar series on Machine Learning organised last year, the UBC is happy to inform you that a mini-symposium on “Algorithms for Genome Assembly” will be held on Tuesday September 24 (13:30-17:30, Minnaert building room 4.16). Registration is free but mandatory due to the limited capacity of the room!
Register here: https://algo4genomeassembly.eventbrite.nl/
Jens Stoye – “Finding all maximal perfect haplotype blocks in linear time”
Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals’ haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Proceedings of BSB 2019) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. Here we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.
Rayan Chikhi – “Question: is de novo genome assembly a solved problem with long
Paola Bonizzoni – “MALVA: Genotyping by Mapping-free Allele Detection of Known Variants”
The amount of genetic variation discovered in human populations is growing rapidly leading to challenging computational tasks, such as variant calling. Standard methods for addressing this problem include read mapping, a computationally expensive procedure; thus, mapping-free tools have been proposed in recent years. These tools focus on isolated, biallelic SNPs, providing limited support for multi-allelic SNPs and short insertions and deletions of nucleotides (indels). Here we introduce MALVA, a mapping-free method to genotype an individual from a sample of reads. MALVA is the first mapping-free tool able to genotype multi-allelic SNPs and indels, even in high-density genomic regions, and to effectively handle a huge number of variants. MALVA requires one order of magnitude less time to genotype a donor than alignment-based pipelines, providing similar accuracy. Remarkably, on indels, MALVA provides even better results than the most widely adopted variant discovery tools.
Paul Medvedev – “De novo transcriptome reconstruction from long reads”
Abstract: Long-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. A second bottleneck is to be able to distinguish sequences errors from true variation with these families. I will present two recent methods to address these challenges. The first is IsoCon (Sahlin et al, 2018, Nat Comm), a method to determine the full-length transcripts of multicopy gene families at nucleotide-level precision, from PacBio data. I will show how IsoCon was applied to Y chromosome ampliconic gene families, each of which contains many nearly identical gene copies. The second is isONclust (Sahlin & Medvedev, RECOMB 2019), a clustering algorithm that can assign Nanopore reads to their gene family of origin.
Jasmijn Baaijens – “De novo approaches to haplotype-aware genome assembly”
Genomes often come in copies, where each copy stems from one of the ancestors. Due to mutation and recombination events these sequences differ genetically, each copy is called a haplotype. The analysis of haplotypes plays an important role in genetics, medicine, and various other disciplines. We present several approaches for haplotype reconstruction that operate in a “de novo” fashion, meaning that our methods do not require any prior information on the genome content. This type of approach avoids any biases towards pre-known genomes and allows for discovery and assembly of novel haplotypes. We present new techniques to address the computational challenges that come with de novo genome assembly. When combined, our tools form the first de novo approach to full-length viral haplotype reconstruction and achieve results with an accuracy beyond any existing method.