Why data management?
Research relies on data that is produced or collected during the research process, or on existing data that is being analysed and built on in the context of new research questions. In particular, with significant advances in high-throughput technologies as genomics, proteomics, and sequencing and consequently the impressive growth of biological data, you may encounter challenges related to huge volumes of data produced. For example, how to deal with (big) data storage? How to share data between different groups? How to track different versions of your datasets? How to set standards and best formats for optimal data reuse?
These questions need to be addressed properly through a data management plan.
Data Management Workshop
This workshop was a starting point for the DLN competence and infrastructure working group 4. For this workshop the idea was to provide a general and supporting framework of guidance, tools and infrastructures to successful data management and sharing. The workshop opened with a pair of invited presentations from data management infrastructures. Natalie Stanford from the University of Manchester presented FAIRDOM and Kjell Petersen from the Computational Biology Unit in Bergen presented NeLS. The two platforms are in particular adapted towards Systems Biology and Omics data, respectively.
Nathalie started her presentation with a FAIR concept (data should be Finable Accessible Interoperable Reusable), and consequently introduced several tools to implement standards in data management, e.g., COPASI, software for simulation and analysis of biochemical reactions and networks and RightField, a tool for annotating spreadsheets. She gave an introduction about the FAIRDOM platform: SEEK and openBIS and a typical pipeline for research groups. At the end the SEEK user interface was shown.
Kjell talked about national infrastructure NeLS (Norwegian e-infrastructure for Life Sciences) inside ELIXIR.no which is designed to support a broad range of end-users in Norway in terms of bioinformatics background. Researchers and scientists affiliated with one of the universities in Norway can easily use NeLS. He also mentioned design requirements and gave an overview of tiered architecture used in NeLS. For computing, Galaxy is used in NeLS due to its concept of history element (capturing parameters, tool versions), organizing tools into workflows, sharing of history and workflows and the possibility of exporting/importing data.
Session related to DLN projects
Participants of DLN projects gave a short introduction about their projects; type of data that will be produced, handling of metadata, current status on data management and their solutions or needs for sharing data and community standards.
Alexander Wentzel, project leader of INBioPharm project, mentioned about high volume of datasets produced in different omics platforms and their challenges in data storage and integration. They are using SEEK in the project and are aiming to expand that line of investigation.
Anders Goksøyr, project leader of dCod, described the project, type of data that are and will be produced and that they are in need for a solution for data storage and also standards for data. The dCod team is interested in testing SEEK and NeLS in the project.
Fabian Grammes (Work Package leader) presented the Digital Salmon project. This project is a partner with FAIRDOM and digital Salmon has assigned one person to get training and implement processed data into SEEK.
Marianne Fyhn, project leader of DigiBrain, described the project and the many different data types and data analysis being performed in the project. They are collaborating with NoreStore for data storage and are developing in-house data management tool. They are working to share data and to make it accessible, but data and modeling standards seems to be somewhat limiting in neuroscience.
Discussions on data management tools and needs
One of the main points of discussion was how to deal with Data Management in DLN. Not a particular DM system has been decided so far for DLN projects, however well functioning systems like SEEK or NeLS, which can fit to most projects, could be a reasonable solution. Integration of SEEK and NeLS into a local DM platform was also suggested, which needs more discussions and consideration. Perhaps a proper description of a DM plan should be implemented for future DLN calls. Identifying a dedicated person within each working project to get necessary training in DM was also proposed and it was accepted by all working projects.
To start using SEEK or NeLS in the research projects, there is a need for training through workshops, hands-on sessions or courses. The competence and infrastructure work group is interested in following up this interest.
This workshop provided a positive ending to the first data management workshop led by the competence and infrastructure work group. Attendees expressed that they learned general concept and would like to understand the broader perspectives and skills needed to help them to manage their research data. In particular Hands-on sessions on using the data management tools on data and models was of interest.