Streaming histogram sketching for rapid microbiome analytics.

Rowe W. P., Carrieri A. P., Alcon C., Caim S., Shaw A., Sim K., Kroll J. S., Hall L. J., Pyzer-Knapp E. O., Winn M. D.. (2019)

Microbiome, 7, 40


The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for tyrhe compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time.


Microbiome, 7, 40


View Publication