NSU researcher part of a flagship study on vertebrate genomes
- Unprecedented novel discoveries have implications for characterizing biodiversity for all life, conservation, and human health and disease.
o This finding provides novel avenues of research to increase immune defenses, particularly relevant for emerging infectious diseases, such as the current COVID-19 pandemic.
FORT LAUDERDALE/DAVIE, Fla. – Two decades ago, the full genome sequence of humankind was released. It was funded by international government and philanthropic sources at a cost of billions of dollars.
Fast forward to 2008 and, driven by the need for better genome understanding and the precipitous drop in sequencing costs, the Genome 10K Community of Scientists (G10K) was established to promote and ensure the genome analysis of 10,000 species of vertebrates. The G10K-sponsored Vertebrate Genomes Project embraced dramatic improvements in sequencing bio-technologies in the last few years to expand production of high-quality reference genome assemblies for all ~70,000 living vertebrates in the coming years.
Today, the G10K sponsored Vertebrate Genomes Project (VGP) announces their flagship study and associated publications focused on genome assembly quality and standardization for the field of genomics. This study includes 16 diploid high-quality, near error-free, and near complete vertebrate reference genome assemblies for species across all taxa with backbones (i.e., mammals, amphibians, birds, reptiles, and fishes) from five years of piloting the first phase of the VGP project.
In a special issue of Nature, along with simultaneous companion papers published in other scientific journals, the VGP details numerous technological improvements based on these 16 genome assemblies. In this new study, the VGP demonstrates the feasibility of setting and achieving high-quality reference genome quality metrics using state-of-the-art automated approach of combining long-read and long-range chromosome scaffolding approaches with novel algorithms that put the pieces of the genome assembly puzzle together. To date, the current VGP pipelines have led to the submission of 129 diploid assemblies representing the most complete and accurate versions of those species to date, and is on the path to generating thousands of genome assemblies, demonstrating feasibility in not only quality standardization but also scale.
Some of the animals that were part of this study included, but were not limited to:
- Mammals: Pale spear-nosed bat, Egyptian fruit bat; Canada lynx; vaquita; platypus;
- Birds: Zebra finch; kakapo; Anna’s hummingbird;
- Reptile: Goode’s thornscrub tortoise;
- Fish: Zig-zag eel; climbing perch; blunt-snouted clingfish.
“When we first started the G10K idea, we gathered a small handful of diverse field zoologists together with genome-centric computer scientists, pledging to work together to develop genome sequence data for thousands of the world’s vertebrates,” said Stephen O’Brien, Ph.D., a professor and research scientist at Nova Southeastern University’s (NSU) Halmos College of Arts and Sciences. “We wanted to offer a gift for the next generation of genome scientists. Today the dream of genome empowerment of so many living species took a giant leap forward.”
O’Brien is the co-founder of the Genome 10K Consortium, the Chief Scientific Officer at the Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, Russia and is a member of the National Academy of Sciences.
The G10K-VGP’s approach combines assembly pipelines with manual curation to fix misassemblies, major gaps, and other errors, which informs the iterative development of better algorithms. For example, the VGP helped reveal high levels of false gene duplications, losses or gains, due mostly to algorithms not properly separating maternal and paternal chromosomes. One solution includes a trio binning approach of using DNA from the parents to separate out the paternal and maternal sequences in the offspring. For cases where parental data is unavailable, another solution developed by the VGP and collaborators is an algorithm called FALCON-Phase that reduces the computational complexity of phasing maternal and paternal DNA sequences at chromosome scale.
“When I was asked to take on leadership of the G10K-VGP in 2015, I emphasized the need to work with technology partners and genome assembly experts on approaches that produce the highest quality data possible, as it was taking months per gene for my students and postdocs to correct gene structure and sequences for their experiments, which was causing errors in our biological studies”, said Erich Jarvis, lead of the VGP sequencing hub at The Rockefeller University, Chair of the G10K and a Howard Hughes Medical Institute Investigator. “For me this was not only a practical mission, but a moral imperative.”
Kerstin Howe, lead of the curation team at the Wellcome Sanger Institute in the UK, said: “Our new approach to produce structurally validated, chromosome-level genome assemblies at scale will be the foundation of ground-breaking insights in comparative and evolutionary genomics.”
“It truly was a challenge to design a pipeline applicable to highly diverged genomes – our largest genome, 5GB in size, broke almost every tool commonly used in assembly processes,” said Arang Rhie, from the National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, who is the first author of the flagship paper. “The extreme level of heterozygosity or repeat contents posed a big challenge. This is just the beginning; we are continuously improving our pipeline in response to new technology improvements.”
Adam Phillippy, chair of the VGP genome assembly and informatics working group of more than 100 members and head of the Genome Informatics Section of the National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA, added: “Completing the first vertebrate reference genome, human, took over 10 years and $3 billion dollars. Thanks to continued research and investment in DNA sequencing technology over the past 20 years, we can now repeat this amazing feat multiple times per day for just a few thousand dollars per genome.”
Specific to conservation and in collaboration with the Māori in New Zealand and officials in Mexico, genomic analyses of the kākāp?, a flightless parrot, and the vaquita, a small porpoise and the most endangered marine mammal, respectively, suggest evolutionary and demographic histories of purging harmful mutations in the wild. The implication of these long-term small population sizes at genetic equilibrium gives hope for these species’ survival.
Richard Durbin, a Professor at the University of Cambridge and lead of the VGP sequencing hub at the Wellcome Sanger Institute in the UK, said: “These studies mark the start of a new era of genome sequencing that will accelerate over the next decade to enable genomic applications across the whole tree of life, changing our scientific interactions with the living world.”
The G10K-VGP consortia involves hundreds of international scientists working together from more than 50 institutions in 12 different countries since the VGP was initiated in 2016 and is exemplary in its scientific cooperation, extensive infrastructure, and collaborative leadership. Additionally, as the first large-scale eukaryotic genomes project to produce reference genome assemblies meeting a specific minimum quality standard, the VGP has thus become a working model for other large consortia, including the Bat 1K, Global Invertebrate Genome Aliance-GIGA, Pan Human Genome Project, Earth BioGenome Project, Darwin Tree of Life, and European Reference Genome Atlas, among others.
“The VGP project is at the vanguard of the creation of a genomic catalog in analogy with Linnaeus’ classification of life, said Gene Myers, lead of the VGP sequencing hub at the Max Planck Institute in Dresden, Germany. “I and my colleagues in Dresden are excited to be contributing superb genome reconstructions with the funding of the Max-Planck Society of Germany.”
As a next step, the VGP will continue to work collaboratively across the globe and with other consortia to complete Phase 1 of the project, approximately one representative species per 260 vertebrate orders separated by a minimum of 50 million years from a common ancestor with other species in Phase 1. The VGP intends to create comparative genomic resources with these 260 species, including reference-free whole genome alignments, that will provide a means to understand the detailed evolutionary history of these species and create consistent gene annotations. Genome data are primarily generated at three sequencing hubs that have invested in the mission of the VGP including The Rockefeller University’s Vertebrate Genome Lab, New York, USA; Wellcome Sanger Institute, UK; and Max Planck Institute, Germany.
Phase 2 will focus on representative species from each vertebrate family and is currently in the progress of sample identification and fundraising. The VGP has an open-door policy and welcomes others to join its efforts, ranging from fundraising and sample collection to generating genome assemblies or including their own genome assemblies that meet the VGP metrics as part of our overall mission.
The VGP collaborated with and tested many protocols from genome sequencing companies, some of whose scientists are also co-authors of the flagship study, including from Pacific Biosciences, Oxford Nanopore Technologies, Illumina, Arima Genomics, Phase Genomics, and Dovetail Genomics. The VGP also collaborated with DNAnexus and Amazon to generate a publicly available VGP assembly pipeline and host the genomic data in the Genome Ark database. The genomes, annotations and alignments are also available in international public genome browsing and analyses databases, including the National Center for Biotechnology Information Genome Data Viewer, Ensembl genome browser, and UC Santa Cruz Genomics Institute Genome Browser. All data are open source and publicly available under the G10K data use policies.
Other novel biological discoveries from the 16 genomes in the flagship paper, and 25 genomes total from over 20 papers in this first wave of publications include:
- Corrections of false gene or chromosome losses, where previous assemblies missed between 30% to 50% of GC-rich protein-coding gene regulatory regions, which were considered to belong to the ‘dark matter’ of the genome;
- Newly identified chromosomes in the zebra finch and platypus;
- Complete and error free mitochondrial genomes for most species, some generated in single molecule sequences without the need for assembly;
- Wild sex chromosome evolution in monotreme mammals and birds;
- Genetic variations between humans and marmosets that have implications for marmosets as an emerging non-human primate model system for biomedical research;
- Lineage-specific changes shaping the evolution of bird and mammal genomes: duck, emu and platypus and echidna; and
- Proposal for a universal evolution-based revised nomenclature for the oxytocin and vasotocin ligand and receptor families.
Links to all of the reports related to this package can be found on Nature‘s website.
Be sure to sign up for NSU’s RSS feed so you don’t miss any of our news releases, guest editorials and other announcements. Please sign up (https:/
About Nova Southeastern University (NSU): At NSU, students don’t just get an education, they get the competitive edge they need for real careers, real contributions and real life. A dynamic, private research university, NSU is providing high-quality educational and research programs at the undergraduate, graduate, and professional degree levels. Established in 1964, the university includes 15 colleges, the 215,000-square-foot Center for Collaborative Research, the private JK-12 grade University School, the world-class NSU Art Museum Fort Lauderdale, and the Alvin Sherman Library, Research and Information Technology Center, one of Florida’s largest public libraries. NSU students learn at our campuses in Fort Lauderdale, Fort Myers, Jacksonville, Miami, Miramar, Orlando, Palm Beach, and Tampa, Florida, as well as San Juan, Puerto Rico, and online globally. With nearly 200,000 alumni across the globe, the reach of the NSU community is worldwide. Classified as having “high research activity” by the Carnegie Foundation for the Advancement of Teaching, NSU is one of only 59 universities nationwide to also be awarded Carnegie’s Community Engagement Classification, and is also the largest private institution in the United States that meets the U.S. Department of Education’s criteria as a Hispanic-serving Institution. Please visit http://www.
The Vertebrate Genome Laboratory at The Rockefeller University: The Rockefeller University is one of the world’s leading biomedical research university and is dedicated to conducting innovative, high-quality research to improve the understanding of life for the benefit of humanity. The university’s 70 laboratories conduct research in neuroscience, immunology, biochemistry, genomics, and many other areas. The Vertebrate Genome Laboratory (VGL) at the Rockefeller University specializes in long-read genomic technologies. The VGL is one of the three VGP sequencing hubs. It is equipped with cutting-edge genomic technologies including several Pacific Biosciences and Oxford Nanopore sequencers, a Bionano Genomics Saphyr optical mapper, and a 10x Genomics Chromium microfluidics instrument. Composed of a team of experts in long reads and ultra-High-Molecular Weight DNA, the VGL strives to find a way to decipher life’s blueprint from any samples, even the most challenging ones. Using state of the art technologies and extensive international collaborations, we are devoted to fill the gap between field scientists and geneticists. We are particularly proud to play our small part in the effort of reversing species extinction by sequencing genomes of endangered species before it is too late. Please visit http://vertebrategenomelab.
The Wellcome Sanger Institute: The Institute is a world leading genomics research center, where scientists undertake large-scale research that forms the foundations of knowledge in biology and medicine. The institute is open and collaborative; the data, results, tools and technologies are shared across the globe to advance science. Its ambition is vast – the institute takes on projects that are not possible anywhere else, understanding the power of genome sequencing to understand and harness the information in DNA. Funded by Wellcome, the institute has the freedom and support to push the boundaries of genomics. The research findings are used to improve health and to understand life on Earth. Please visit http://www.
The MPI-CBG: The Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) in Dresden, Germany is one of more than 80 institutes of the Max Planck Society, an independent, non-profit organization in Germany. 500 curiosity-driven scientists from over 50 countries ask: How do cells form tissues? The basic research programs of the MPI-CBG span multiple scales of magnitude, from molecular assemblies to organelles, cells, tissues, organs, and organisms.
National Human Genome Research Institute, National Institutes of Health: NHGRI is one of the 27 institutes and centers at the National Institutes of Health, focused on advances in genomics research. Building on their leadership role in the initial sequencing of the human genome, NHGRI collaborates with the world’s scientific and medical communities to enhance genomic technologies that accelerate breakthroughs and improve lives. Empowering and expanding the field of genomics can benefit all of humankind. please visit https:/
Howard Hughes Medical Institute: The Howard Hughes Medical Institute plays an important role in advancing scientific research and education in the United States. Its scientists, located across the country and around the world, have made important discoveries that advance both human health and our fundamental understanding of biology. The Institute also aims to transform science education into a creative, interdisciplinary endeavor that reflects the excitement of real research. HHMI’s headquarters are located in Chevy Chase, Maryland, just outside Washington, DC.
San Diego Zoo Wildlife Alliance: The San Diego Zoo Wildlife Alliance is a nonprofit international conservation leader, committed to inspiring a passion for nature and creating a world where all life thrives. The Alliance empowers people from around the globe to support their mission to conserve wildlife through innovation and partnerships. San Diego Zoo Wildlife Alliance supports cutting-edge conservation and brings the stories of their work back to the San Diego Zoo and San Diego Zoo Safari Park – giving millions of guests, in person and virtually, the opportunity to experience conservation in action. The work of San Diego Zoo Wildlife Alliance extends from San Diego to strategic and regional conservation “hubs” across the globe, where their strengths–via their “Conservation Toolbox,” including the renowned Wildlife Biodiversity Bank–are able to effectively align with hundreds of regional partners to improve outcomes for wildlife in more coordinated efforts. By leveraging these tools in wildlife care and conservation science, and through collaboration with hundreds of partners, San Diego Zoo Wildlife Alliance has reintroduced more than 44 endangered species to native habitats. Each year, San Diego Zoo Wildlife Alliance’s work reaches over 1 billion people in 150 countries via news media, social media, their websites, educational resources and the San Diego Zoo Kids channel, which is in children’s hospitals in 13 countries. Success is made possible by the support of members, donors and guests to the San Diego Zoo and San Diego Zoo Safari Park, who are Wildlife Allies committed to ensuring All Life Thrives.
UC Santa Cruz Genomics Institute: Comprising diverse researchers from a variety of disciplines across academic divisions, the UC Santa Cruz Genomics Institute leads UC Santa Cruz’s efforts to unlock the world’s genomic data and accelerate breakthroughs in health and evolutionary biology. Our platforms, technologies, and scientists unite global communities to create and deploy data-driven, life-saving treatments and innovative environmental and conservation efforts. We are revealing life’s code™. Please visit genomics.ucsc.edu for more information.