- Speaker: Dr Zam Iqbal – Research Group Leader at the European Bioinformatics Institute
- Title: Nucleotide resolution analysis of the bacterial pan-genome
- Date of seminar: Thursday 7th November 2019
- Time: 11am to 12pm.
Abstract: When we study evolution of a species, we use different models, depending on what we want to achieve or infer. We might restrict to SNP variation in the “core genome” (presumably inherited vertically) to study phylogeography or to study an outbreak. In reducing the problem to the analysis of SNPs (and invariant sites), it has been possible for researchers to build a range of sophisticated phylogenetic models. However once we try to incorporate genome organisation, chromosomal rearrangements, movement of plasmids, transposons or phage, then the modelling problem is far harder. The question of how to properly model bacterial genetic variation is wide open and extremely challenging.
A prerequisite for any solution to this, is a decision on how to describe the variation in the first place – you cannot model variation until you represent it. Note that this is true even if you have perfect genome assemblies: even if it were possible to multiple sequence align them, this would not really help with how to notice that a SNP at one position in one genome is “the same” as a SNP somewhere else in another.
In this talk, I want to discuss a solution we have been developing to this representation problem. We show how it is possible to represent the pan genome of a species as a network of “floating” graphs, representing the ensemble of known variation in pathology blocks (we use genes and intergenic regions, but this could be done for mobile elements also). In doing so it becomes possible to discover and describe genetic variation at fine (SNP/indel) and coarse (gene order) level. This is a major research theme for my group and I describe progress to date, including results on both illumina and nanopore data
Biography: Zamin Iqbal leads a team at EBI working on fundamental method development for microbial sequence analysis. He works in 3 major areas. First, data structures for indexing and searching DNA archives (Bradley et al, Nature Biotechnology 2019 and https://arxiv.org/abs/1905.09624), allowing for example MLST-typing of all E coli genomes in the European Nucleotide Archive (ENA), predicting drug resistance in all M. tuberculosis, or tracking plasmids across species. Second, variant calling software tailored for bacterial pan-genomes, working with both nanopore and illumina data, and avoid using reference genomes entirely. Third, translation of the above (and other) methods for analysis of M. tuberculosis in public health settings; in particular the Mykrobe tool for drug resistance prediction (Bradley et al, Nature Communications, 2015), analysis of sequence data after direct extraction from sputum (Votintseva et al, J Clin Microbiol 2017), working with collaborators in Madagascar and India to apply these in their settings, and leading the sequence analysis of 100,000 M.tuberculosis genomes for the CRyPTIC project.
Host: Justin O’Grady