DLN can help you with Data Management
With the significant advances in high-throughput omics technologies including genomics, transcriptomics, proteomics and metabolomics and consequently the impressive growth and availability of biological (omics) data, you may encounter new challenges both for storage, preservation, and integration of the data and for their pre-processing and analysis. For example, how and where to store big data? How to share data between groups in a project? How to track different versions of your datasets and analyses? Which standards and formats should be used to allow data reuse?
Data management (DM) is important for responsible research. It includes the storage, archiving and preservation of you research data including required metadata. Before starting a new research project, issues related to DM should be addressed properly through a data management plan (DMP). This saves you a lot of time and effort later on. In addition, you are assured that the data you produce will be preserved in a clear, findable and useable format. The goal of DM is to ensure that data is treated according to the so-called FAIR principles, that is, data should be made 'Findable, Accessible, Interoperable and Re-usable'.
Where do I keep my data? How do I share my data? Which tools should be used to deposit my data?
Often researchers are not sure which facilities and tools exist, and which ones are best to use. Therefore, the data management coordinator in work group 4 (Competence and Infrastructure) of Digital Life Norway (DLN), works to provide data management support for DLN researchers and projects. DLN projects are free to use any DM system. However we work to help identify and support well functioning systems, which can fit to most DLN projects including system biology, systems medicine, mathematical modelling and the storage of (high-volume) omics data and their analysis.
Towards this aim, we had our first Data Management workshop on 23-24 November in Trondheim for DLN projects. This workshop was organized together with the RRI team and offered a general introduction to two tools and infrastructures for DM. The workshop featured an introduction to the SEEK (https://fairdomhub.org/) and NeLS (https://nels.bioinfo.no/) platforms which are adapted towards systems biology and omics data, respectively.
SEEK, is a web-based resource platform for managing and sharing scientific research datasets and models in project inside FAIRDOMHub. It can also be downloaded and locally installed. SEEK is based on the ISA format; Investigation, Study, Assay and is flexible in terms of the type of data stored. You can point to local data storage while keeping the metadata or smaller derived data sets in SEEK. In addition SEEK also provides several tools to implement standards in DM, e.g., COPASI and JWS, softwares for simulation and analysis of biochemical reactions and networks and Rightfield, a tool for annotating spread sheets to create semantically aware Excel spreadsheet templates. FAIRDOM is collaborating with the COmputational Modelling in BIology NEtwork (COMBINE) which coordinates standards for modelling in biology and FAIRDOM is running the data and modelling management component of Infrastructure for Systems Biology Europe (ISBE).
NeLS (Norwegian e-infrastructure for Life Sciences) developed and operated in context of ELIXIR.no, is an infrastructure providing storage, data sharing and analysis tools. NeLS connects to the national data storage platform NorStore, allowing long-term storage. For data analysis and computing, Galaxy is used in NeLS. Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biological research that allows computational workflows to be set up and used, without the need of programming skills. Using Galaxy in NeLS makes it easy to document the processing and analysis of research data.
At the end of the first data management workshop, many participants expressed interest in having a hands-on session on using SEEK and NeLS at a more advanced level. Moreover, integration of SEEK and NeLS for use in DLN was also suggested. This integration will solve the capacity issue of (raw) omics dataset in DLN projects.
As an outcome of this workshop, we are currently exploring an integration of the two platforms with the FAIRDOMHub community. From NeLS side Kjell Petersen and Kidane M Tekle from the university of Bergen are helping us with this integration including the technical work required. We are also organizing our first hands-on workshop on using the data management tools SEEK and NeLS. This will take place in Bergen, 8th and 9th of May. The workshop will be useful for both experimental and computational scientists and is organized in collaboration with the DLN Research School.