Why do we need research infrastructures? For the researcher, the answer is obvious. Either you work in the lab, sampling from a research vessel, or sitting in front of a computer, you need proper equipment and tools to conduct your research and obtain results. Why digital? The combination of technological development and digitalization impacts the way research is carried out. Eventually, all research projects end up as results expressed as a series of zeroes and ones, with the need to be organized and visualized. Deciphering whole genome sequences, sub-molecular structures, or the cell’s micro-cosmos is a tour-de force of combining advanced scientific technology and informatics.
This meeting was a starting point for the DLN competence and infrastructure network, which is part of our work group 4 activities. For this meeting the idea was get an overview of important biological and medical research infrastructures (BMS RIs) nationally and in addition, get an array of expert presentations by groups that are developing methods for use in modeling of biological systems. One major ambition for the meeting was to get information about how the infrastructures are working, how this comply with specialized users in transdisciplinary research projects and what are the needs and possible solutions to best coordinate the BMS RI “ecosystem”. The program and links to presentations can be found below.
The Work Group 4 leader, Inge Jonassen, introduced the DLN and the work on competence and infrastructure network, as several of the participants were not that familiar with this. From the Research Council of Norway, Jacob Wang presented the convergence idea behind the establishment DLN and how the centre can function as a national platform for collaboration now that the 3rd biotech revolution is taking off. To get more familiar with DLN projects, project leader Anders Goksøyr presented dCod 1.0, what they are doing, want to do and will need of enabling technologies. The session on DLN was then finished by a presentation of Responsible Research and Innovation by Dorothy Dankel, who is involved in several BIOTEK projects. She presented her path as a researcher going from the humanities into reflexive systems biology, ELSA and now RRI. The Walk shop concept was also presented – to facilitate interactions between different disciplines in absence of the normal boundary conditions.
Sessions on national infrastructures
The meeting was organized into sessions with discussions between each. There were sessions on different global technologies (genomics (NCGC, NGC), sequencing (NSC and NCS-PM), proteomics (PROBE), metabolomics (NTNU, SINTEF, NNP)), screening technologies (NorOpenscreen, SINTEF), imaging (EuroBioImaging, NALMIN, NORMOLIN) and single cell technologies (CellMass), structural biology (NorStruct, NORCRYST, NNP), as well as fermentation (IRIS, SINTEF) and Nano- and microfabrication (NorFab). The national bioinformatics infrastructure, ELIXIR Norway, presented its efforts towards medical and marine bioinformatics as well as the path forward where also a Norwegian node of ISBE (Infrastructure for Systems Biology in Europe) is to be included. In addition, big data infrastructures were presented (HBP, UNI, SIGMA2) that are key to storing and sharing of data, but also doing data analysis.
Methodologies for digital life
The development of novel methodologies is key to the success of any research project. However, for DLN-like projects, where the data harvesting and data analysis is particularly coordinated, this process can be complicated. Best practices at multiple levels in the circular step-movement from 1) hypothesis and experiments (sample preparation, design) to 2) data generation or acquisition, 3) analysis and modeling 4) validation and refinement (see Figure). At the meeting, several methods and considerations were presented at the experimental side, but even more so for the data analysis, representation and visualization, storage (graph databases) and modeling. These methods enable intuitive interactions with data to extract “invisible connections” and integrate different data types. The choice of method for modeling has major consequences for which information can be extracted and which data-types are needed. The possibilities are endless, but not much is “plug-and-play”. Much can be done – to secure quality and correct use of developed tools and mechanisms to preserve them.
Questions were raised on infrastructure interactions and data management
Visibility of platforms and their user contact is one issue. It can be difficult to get the overview of available infrastructures at a national scale, information is fragmented, what about user surveys – are platforms doing that?
Competitive prices for services and ensuring appropriate data richness in modeling type of projects: Should we look at prices in a larger perspective? What is the value of keeping expertise locally/nationally? For generating sufficient data for modeling – bring the computational scientists and infrastructure into the discussion of experimental setup early. This also goes for the contact with infrastructures - When should projects contact infrastructures? Early stage in the project or during application would be preferred. This would also allow realistic budgeting.
Data sharing and management is a MAJOR issue. In a societal perspective – we should share data and do open science. However, metadata (all needed information about samples, instrumentation and analysis) is often missing or insufficiently described - We need to be aware about the importance of metadata. Should the platforms encourage and help scientists to facilitate data sharing? The discussions were quite unified that platforms and data infrastructures must help the researchers to do this. Long term data storage and preservation can be facilitated by ELIXIR, but it must be a joint effort with the researchers. Projects should also include cost for data management, and the use of infrastructures that can facilitate this. In a similar perspective – how do we ensure the quality and preservation of tools (software) developed in projects?
The ecosystem of infrastructures came up as an issue of importance – such as having well-functioning pipelines between different infrastructure types will have an immense added value for the end users, in particular for the DLN type of projects. DLN could play a role to facilitate the bridging of infrastructures at national scale. At the European level, initiatives like BioMedBridges, which is now replaced by CORBEL, are looking into this and could be a model for the work forward in Norway as well.
Use European initiatives to get access to top infrastructure and training
There is a call out now for research projects to get access to many BMS research infrastructures through the portal of CORBEL. We are not good enough to use the opportunities available in the European infrastructure network to get access to state-of-the-art infrastructure and training, to get rapid results for publications or testing and to bring back valuable knowledge to the research group.
DLN will play an active role in the BMS infrastructure network
Centre for Digital Life Norway aims to build the bridge from biotechnology and biology to data science and computation. On the bio-side of the spectrum there is a focus on advanced scientific equipment and state-of-the-art technology, as well as the skill and competence to utilize the infrastructure. Evidently, the digital-side is to be included and considered at the very start of the project, and the data handling and analysis must be integrated. The presentations given exemplified this integrated approach and showcased the services and competence that are at hand for the research community. There are many different levels at which complex data and biological systems can be interrogated. Investigating the structure of the data using topological data analysis, visualization or graph databases can be one level. Modeling the biological processes using different types of statistical methods, constraint based modeling or dynamic modeling, will require different data sets and details of knowledge. These considerations will be taken in the tight collaboration between biologists, mathematicians, computer scientists and the expert service providers of technologies. This is so to speak where the “magic” will happen and we need to find ways to stimulate such interactions. In addition we need well-functioning interactions between different infrastructures. These questions are not straight forward and DLN will work on a strategy to identify important issues that needs to be solved and point to possible ways of doing this. Getting to know each other is a good way to start and the meeting has in that respect been an excellent starting point for the work ahead in DLN on important issues related to infrastructure and competence and we are thankful for all that participated with presentations and discussions!
Figure: The figure shows different steps in the data harvesting – analysis – modeling cycle and how different technologies and expertise come into play.
Talks at the meeting
Inge Jonassen – DLN Competence and Infrastructure network
Jacob Wang – Norwegian Infrastructures for Biotechnology
Anders Goksøyr – Needs and challenges in DLN projects - dCod 1.0
Dorothy Dankel – Adventures as a RRI cross-over researcher: Bringing the humanities
into research and innovation projects
Kjetill Jacobsen – The Norwegian Sequencing Centre
Ola Myklebost – The Norwegian Cancer Genomics Consortium
Finn Drabløs – Medical bioinformatics (ELIXIR)
Frode Berven – PROBE/NorProteomics
Harald Barsnes – Bioinformatics for Proteomics
Tonje Lien - Penalised regression for integration of gene expression and methylation Data
Nils Peder Willassen: Marine Bioinformatics (ELIXIR)
Manuela Zucknick: Predicting drug response in personalized cancer therapy:
An example of statistical learning as a technology for digital life
Fatemeh Ghavidel: Statistical methods for integration of omics data
Stefan Bruckner: Data visualization
Arne Smalås: The NorStruct and NORCRYST infrastructures
Paul Berg: NorOpenscreen
Sonia Gavasso: CELLMASS – Single cell analysis
Stein-Erik Gullaksen: CytBase
Oddmund Bakke: The Norwegian EuroBioimaging network & NALMIN
Kay Gastinger: NorFab - National infrastructure for micro- and nanofabrication
Susanne Gitlesen: Fermentation processes and upscaling - Infrastructure for industrial
Geir Klinkenberg: Biotechnology R&D infrastructure platforms at SINTEF
Inge Jonassen : ELIXIR Norway
Eivind Almaas: ISBE - Systems Biology
Per Bruheim: Metabolomics and data integration
Øyvind Halskau: The Norwegian NMR Platform - NMR applications
Hans Ekkehard Plesser: Human Brain Project - European infrastructure for
Alla Sapronova: Center for Big Data Analysis
Arvid Lundervold: In vivo imaging and image analysis