Guidelines for public database submission of uncultivated virus genome sequences for taxonomic classification

Adriaenssens EM, Roux S, Brister JR, Karsch-Mizrachi I, Kuhn JH, Varsani A, Yigang T, Reyes A, Lood C, Lefkowitz EJ, Sullivan MB, Edwards RE, Simmonds P, Rubino L, Sabanadzovic S, Krupovic M, Dutilh B. (2023)

Nature Biotechnology, 41, 898-902


Mining data derived from high-throughput DNA or RNA sequencing approaches, including metagenomics, has led to the discovery of a multitude of uncultivated virus genome sequences1,2,3,4,5,6,7,8,9,10,11,12. These sequences improve our knowledge about the representation of the global virosphere and fuel the expansion and refinement of virus taxonomy. Incorporation of these newly discovered viral sequences into high-quality reference databases adds a bottleneck to virology. For formal taxonomic classification, International Committee on Taxonomy of Viruses (ICTV) guidelines stipulate that genome sequences must be available from a public database. However, the correct use of nomenclature and the inclusion of standardized metadata fields are just as important as the availability of sequence data to enable the use and reuse of the data by the global research community. Here, we present standards and recommendations for the submission of virus genome sequence data to public databases for the purpose of taxonomic classification. These represent a conceptual and practical extension to the Minimum Information about an Uncultivated Virus Genome (MIUViG) standards that include guidelines for reporting the virus origin, genome quality, genome annotation, taxonomic classification, biogeographic distribution and host prediction13. Aspects of these standards have been reiterated in a recently published consensus viewpoint statement indicating that viruses inferred from metagenomic sequences require strict quality control before they can be used for taxonomic assignments14. The guidelines presented here focus on the MIUViG standards on genome quality and expand on the naming of sequences and their submission to public databases.


Nature Biotechnology, 41, 898-902


View Publication