With the publication of 16 high quality reference genomes of vertebrates, an international team sets the standards for the new era of the genomics of biodiversity and shows how it will allow research in comparative biology, conservation and health. Published in Nature, the flagship study of the VGP is one of the biggest efforts towards standardization in the field of genomics with the involvement of over 50 institutions from 12 different countries since it began more than a decade ago.
Thanks to ten years’ work by the scientific community of the Genome 10K Project (G10K) to sequence the genomes of 10,000 vertebrate species and other comparative genomics efforts around the world, the VGP has taken advantage of the dramatic improvements in sequencing technologies of recent years to start producing high quality reference genome assemblies for over 70,000 extant vertebrates.
The VGP has started the production of high quality reference genome assemblies for over 70.000 extant vertebrates thanks to the technological improvements in sequencing
"This massive comparative genomics project represents a new era of innovation in genome science , developing and utilizing in new ways state-of-the-art techniques of sequencing, assembly and annotation with implications for addressing fundamental questions in comparative biology, genetics and biodiversity conservation ", says Tomàs Marquès-Bonet , principal investigator at the Comparative Genomics group of the IBE and member of the VGP Steering Committee. The consortium will also serve as a model for other coordinated genomics projects, such as the (CBP) Catalan Initiative for the Earth Biogenome Project (CBP), that might take advantage of the extensive infrastructure and knowledge of the VGP, which has involved the collaboration of hundreds of international scientists from over 50 institutions across 12 different countries since the project began.
In a special issue of Nature, with complementary articles published in other scientific journals simultaneously, the VGP details numerous technological improvements in genome assembly. In the main article, the VGP demonstrates the feasibility of establishing and achieving high quality metrics for the reference genome of almost all species. With its new approach, the international team has managed to combine automated long-range reading of genomes by using new algorithms to reconstruct the pieces of the genomic puzzle in each case almost error-free.
"This project represents a new era of innovation in genome science" says Tomàs Marquès-Bonet, principal investigator of the Comparative Genomics Lab at IBE and member of the VGP Steering Committee
"When I was asked to assume the leadership of G10K in 2015, I emphasized the need for more partners and to work on approaches that would produce the highest quality data possible, as the students and postdocs in my own group were taking months to correct the structure of each gene in genome sequences for their experiments", says Erich Jarvis, head of the VGP sequencing centre at Rockefeller University, G10K coordinator and researcher at the Howard Hughes Medical Institute. "For me, that was not just a practical mission but a moral one".
The first genomes analysed have already led to new discoveries with implications for characterizing biodiversity and contributing to conservation and human health. In particular, the first high-quality reference genomes of six species of bat, generated by the Bat 1K consortium, revealed the selection and loss of genes related to immunity that are directly relevant to the investigation of emerging infectious diseases, like COVID-19 today.
As an initial large-scale project of high quality eukaryotic reference genomes, the VGP has also become the working model for other large consortia, including the Earth Biogenome Project , the Darwin Tree of Life , the Catalan Initiative for the Earth Biogenome Project (CBP), and the European Reference Genome Atlas (ERGA), among others.
The first high-quality reference genomes of bats revealed the selection and loss of genes related to immunity, relevant to the investigation of diseases like COVID-19
Only until now, the VGP consortium has led to the generation of more than one hundred genomes representing the most complete versions of these species to date. The genomic data developed have principally been generated by three sequencing centres committed to the mission of the VGP, including the vertebrate genome lab at Rockefeller University (New York, USA) - partly supported by the Howard Hughes Medical Institute, the Wellcome Sanger Institute (United Kingdom), and the Max Planck Institute .
As its next step, the VGP will continue to work in networks around the world and with other consortia to complete the first phase of the project which will consist of analysing approximately 260 species - with one representative species for each order of vertebrates separated by a minimum of 50 million years from a common ancestor with other species. The VGP is planning to create genomic resources that will also enable relating these 260 species , including complete genomes, that provide a means to understand their evolutionary history in great detail. The second phase will focus on analysing representative species of each family of vertebrates and is currently in the process of identifying samples and raising funds.
Rhie A. et. al. Towards complete and error-free genome assemblies of all vertebrate species (2021). DOI: doi.org/10.1038/s41586-021-03451-0.