A Norwegian Earth BioGenome Project: the initial launch phase (EBP-Nor)
Sequencing Life for the Future of Life
The international Earth Biogenome Project (EBP) was started in 2018 to take advantage of the recent advances in sequencing technology. The vision is grand; the goal is to sequence and assemble high quality reference genomes of all eukaryotic genomes on Earth. We have recently been funded by the Research Council to start with a national effort in Norway - the EBP-Nor project. The first phase of the project - funded with 30 M NOK (2021-2024) entails setting up the organisation and infrastructure to do this, and to sequence and assemble a number of selected species (around 150). Digital Life Norway has been involved from the very beginning of the development of the first phase of this large national project.
A tremendous development of sequencing technologies
Sequencing technologies that enable us to read the DNA in the cells of different species, have had a tremendous development since its beginning. In the early years of DNA sequencing, just a few hundred nucleotides could be read at a time in time-consuming protocols requiring lage teams, and lasting for years. Today, we can generate multiple gigabases and more in a single experiment lasting for a few days. However, since genomes often contain repetitive motives, sequences that are identical or very similar, putting together reads that were 100-1000 nucleotides long into chromosomes that are around 100 million nucleotides long (depending on species) often lead to fragmented results. Usually this implies that chromosome- length assemblies cannot easily be reconstructed and if they could, they would contain a lot of gaps, in other words, regions where the sequence was unknown. However, with the recent developments in long range sequencing it is possible to obtain far less fragmented genome assemblies - and to a much lower cost.
With the advent of long read sequencing technologies, from companies such as Oxford Nanopore and Pacific Biosciences, chromosome length assembles have become Long read sequence reads may have lengths from 15 kb and even up to 100kb (compared to 100bp-250bp with short reads such as Illumina). With the development of circular consensus sequencing (HiFi) the accuracy of long reads is more than 99.5 %. In addition, there have been new developments of technologies that provide even longer range genome information, such as the chromosome conformation technology Hi-C. As a result, we can now routinely create genome assemblies that have all intact chromosomes represented with only a few gaps. This is crucial for the high quality genomes that will be generated by EBP-Nor.
COVID research and genome sequencing
The new sequencing methodology is extremely powerful in many aspects. For instance, researchers have used comparisons of 450 vertebrate genomes, including 252 mammals, to study the conservation of ACE2, the binding protein for SARS-COV2 (https://www.pnas.org/content/117/36/22311). Animals with ACE2 similar to humans can act as reservoirs to the virus in the wild, and by knowing which species this might be, we can investigate those. Having the complete genomes of multiple species, the more the better(!), enables rapid comparative investigation of any gene/protein that might be interesting in such outbreaks as the recent and ongoing COVID-19 outbreak.
Pulling together with “Folkehelsa” on Covid
The Norwegian Sequencing Centre (UiO and OUS; https://www.sequencing.uio.no) and CIGENE (NMBU; https://cigene.no) possessing instruments for long read sequencing will be the main labs to generate the sequencing data. In addition, the sequencing core facilities comprising the national NorSeq consortium (https://www.norseq.org) will also be involved (Illumina sequencing). During the covid pandemic NorSeq has helped the Norwegian Institute of Public Health (FHI) to substantially increase the capacity in Norway for whole genome sequencing of SARS-COV2 for the surveillance of virus variants. This also shows the strength of pulling together at a national scale - and it shows the implications of having strong national infrastructures supported by the Research Council of Norway.
A national effort
EBP-Nor is a partnership of the major universities in Norway (UiO, NMBU, UiB, NTNU, Uni Nord and UiT), the research institute SINTEF, and the non-academic institutions REVOcean, the Life Science Cluster, the Norwegian Environment Agency, and ArcticZymes Technologies with the aim of sequencing, cataloguing and assembling all eukaryotic species occurring in Norway, estimated to 45,000 species (which is likely a solid underestimate!). We are really happy about EBP-Nor being a partner project in the Digital Life Norway portfolio.
Impacts and the next steps
The availability of (near) complete genomes of more and more species will lay the foundation for breakthroughs in many aspects of research and human society. First and foremost, having all genes in a genome properly assembled and annotated (older genome assemblies might have genes spread across multiple pieces of the assembly) will ease all analysis using these genomes. It will help to understand, conserve and protect biodiversity. Medical research will have many more research models for any purpose, drugs and various treatments will be developed on the basis of genome information. The Covid pandemic has taught us quite a bit here (!). The EBP-Nor is only in its first (Launch) phase - the next phase will involve sequencing of many thousands of genomes, funding within a totally different range as we now have, and we believe impact the Norwegian society fundamentally.