A hands-on introduction to artificial intelligence in computational biotech and medicine
Late 2020 or early 2021
About the course
Recent years have seen a surge of interest in machine learning and artificial intelligence. This is caused by highly-visible breakthroughs in a variety of areas like computer vision, natural language processing, speech recognition and synthesis, and in the analysis of unstructured, tabular data. As many of the challenges faced in more general domains are transferable to specialized domains, we have seen a torrent of scientific publications and new applications across all data-driven fields, from medicine to physics, biology to cosmology. The course provides a practical, project-based, hands-on exploration of the state-of-the-art techniques and software frameworks from machine learning and deep learning for solving real-world problems from biomedicine, biotech, and related fields. It will be a guided tour of a useful, interesting, and important landscape, pointing out theoretical and application-oriented gold-mines along the way.
The course is structured around four labs, an in-class competition, and project pitches.
Lab 0: Getting started
Get access to necessary cloud computing infrastructure and tools. The students are pointed to resources for gaining the necessary background in programming (Python and friends) and machine learning. The students will be encouraged to go through the material in Lab 0 on their own before the first gathering in the course.
Lab 1: Sequences and tabular data
Topics will include applications in e.g. medical health records and single-cell multi-omics, using classical machine learning and deep learning (e.g. natural language processing techniques, embeddings, sequence-to-sequence mappings).
Lab 2: Images in time and space
Topics will include live-cell imaging, MRI and video sequences (in behavioral and biological surveillance), using e.g. supervised and unsupervised classification and convolutional neural networks for image segmentation, classification, object detection, and regression.
Lab 3: Drug discovery, RRI, and ethics
Topic will include scoring functions for docking and drug-target interactions, quantitative structure-activity/structure-property relationship (QSAR/QSPR), and de novo structure generation, using classical machine learning and "deep learning". The lectures associated to the lab will also deal with topics from responsible research and innovation (RRI), ethics, and "explainable AI", topics that are important for all applications of machine learning.
There will be a project-based competition were the students are divided into groups (aiming for a combination of "wet lab" and "dry lab" lab members), competing on a biomedical task using the techniques of the course. The competition will be hosted on Kaggle (the course organizers have good experiences from organizing such competitions in other courses).
Project pitch with plenary presentation
As part of the course, the participants will pitch an idea/sketch of a how to use AI/ML in their own PhD project, or for a closely related area. E.g. a PhD candidate working on single-cell RNA sequencing or CITE-seq in her PhD project presents how deep learning could be used in the integrative analysis, pointing to literature or concretely digging into it. The students should produce a short, condensed written summary, and give an oral, plenary speed-presentation (5-7 mins) for their fellow students. By the end of the course the participants will have a solid understanding of the ideas of machine learning, and good experience and practical intuitions around how the methods can be applied in biomedical domains, using modern tools and frameworks. They will also be familiar with challenges and limitations of applied machine learning.
- Able to explain the fundamental concepts of machine learning
- Able to explain how machine learning can be used to solve practical problems in biological and biomedical domains
- Able to explain the limitations and challenges involved in using machine learning in biology, biomedicine and biotech, and related societal challenges. An understanding of the concepts Explainable AI, and RRI
- An appreciation for the importance of open science, reproducible research, and of the role of competitions and challenges in modern data science-based research
- Can find and use modern software tools for data analysis, visualization and reporting (e.g.figure/graphics production with Jupyter notebooks).
- Transdisciplinary communication: can communicate selected methods and software packages where these methods are implemented and explain their relevance to medical research and clinical practice.
Acknowledging the importance of mathematical models and computations in the analysis and understanding of complex biological systems, and the need of crossdisciplinary collaborations in future biotech and medicine.
The course gives 5 ECTS, and the students will be evaluated based on three activities:
- The course Kaggle InClass challenge. Plenary presentations.
- Project pitch
- Individual MCQ exam using WISEflow
Block 0: before the course starts
- Lab 0: Getting started. Description of course and how they should work with the code repository, challenge, and their project pitch.
Block 1: 21-26 September
- Motivation lectures (motivates the course topics, structure and philosophy)
- Lectures and hands-on work on lab 1
- Lectures and hands-on work on lab 2
- Social activity
Block 2: 5-9 October
- Lectures and hands-on work on lab 3
- Closing lecture
- Plenary, group-wise presentation of work on the course challenge
- Plenary, individual pitch of project (speed-presentation of about 5-7 mins)
- Plenary discussions
- Digital exam from home or at the premise (choice by student)
The course is aimed at the subset of DLN research school members that already have some experience in programming with Python, R or MATLAB, and a strong interest in machine learning and artificial intelligence. Participants from methodological fields like computer science, statistics, and mathematics are welcome to join the course, as long has they have some familiarity with biological or biomedical data and a strong interest in biotech or medicine.