Dr Nabil-Fareed Alikhan

09 August 2018
11:00am

QIB Lecture Theatre

Understanding genomic landscapes in EnteroBase with cgMLST & GrapeTree

Speaker: Dr Nabil-Fareed Alikhan, University of Warwick, will present a seminar entitled: Understanding genomic landscapes in EnteroBase with cgMLST & GrapeTree

Nabil-Fareed Alikhan

Host: Andrew Page

 

Abstract:
Sequenced raw reads are available in ENA for >647,000 bacterial genomes. Important goals for such data may be identifying groups of genetically related bacteria in order to facilitate epidemiological tracking or in depth analyses. However, even these simple goals are difficult unless the raw data is codified.

We have developed browser-based tool, EnteroBase (http://enterobase.warwick.ac.uk), which provides access to genomic assemblies, genotypes and analytical tools to biologists, clinicians and epidemiologists. EnteroBase includes consistent high-resolution genotyping by core genome multi-locus sequence typing (cgMLST) schemes for Salmonella, Escherichia, Yersinia & Clostridioides and its intuitive visualization by GrapeTree (https://github.com/achtman-lab/GrapeTree) (1).

Phylogenetic analyses via single nucleotide polymorphisms (SNPs) of up to 1,000 genomes are also available on-demand. An initial impression of the benefits of this approach can be found in a recent review article (2). We are already implementing the combination of data from modern genomes with ancient DNA.

EnteroBase contains >150,000 genomes from Salmonella and >70,000 from Escherichia. These are unprecedented troves of data on the diversity within these two genera, and the size of these database will continue to increase dramatically over the next few years. All read data are checked for quality, assembled and genotyped with a versioned pipeline, ensuring consistency. EnteroBase supports sharing of data within private groups of buddies as well as publishing graphical analyses and datasets for the entire global community. We are also establishing facilities to allow free download of all genomes in EnteroBase via a dedicated server.

MSTree V2 and RapidNJ are implemented within GrapeTree, and can identify important clusters of related organisms among 100,000 cgMLST genomes. However, we are already preparing for the future that will encompass orders of magnitude more genomes by developing hierarchical clustering, which will provide persistent and scalable designations, as a general tool for microbial genomics.

 

All staff from organisations on the Norwich Research Park are welcome to attend.