Communicating Statistics in Biotechnology
When I started my postdoctoral research at the Centre for Molecular Medicine Norway (NCMM), I soon realized that collaboration between statisticians and experimental biologists is a complicated task. Here is what I learned:
Statistics is common practice in most fields of science. A biologist might need statistics to perform hypothesis testing to compare new compounds, a financial advisor uses it to maximize the profit in the trading market and a GP reports the number of this year’s flu cases. All these activities have one common characteristic: the need to quantitatively describe real-life events.
To do this, we need a shared language across disciplines. This language is the language of mathematics. When the numbers refer to uncertainty, we call it statistics.
Statistics answers the need to deliver information about biological events in a precise and reliable manner. Statistics is the best way to organize and present results that vary due to different sources of uncertainty. Still there are issues with how biologists and statisticians interpret the results. The disparity is mainly caused by a lack of communication between the one performing the experiments and the one analyzing the data.
In biology and biotechnology, the experimental conditions are artificially generated to test a specific set of hypotheses. This setup encourages hope before the experiment begins, hindering the best interpretation of the results.
A common example of such misunderstanding is the use of p-value: this number alone is often not adequate to give a full perspective on the testing procedure. Also, there is the risk of ignoring some information present in the data, affecting the interpretation. An erroneous conclusion is that “the statistics are wrong”, when the problem is instead the interpretation of the statistical analyses.
A second problem is the choice of the statistical method used for the analysis of laboratory data. Often, experimental scientists choose a very simple statistical tool to interpret a very complex biological system. For instance, reducing the results from thousands of gene expression analyses to a fairly standard multiple-testing problem is deflating, and often inconclusive.
The development in biological methodology needs to be matched by adequate statistical and mathematical tools.
During my PhD I analyzed time-varying data, but my main project at NCMM involves anti-cancer drug combinations – a totally different kettle of fish! After much reading and numerous questions to my new collaborators, I was able to tackle the problem in a formal way. Thanks to the universality of statistical language, I could communicate my ideas and results to my colleagues, despite the different educational background.
Statisticians get more and more involved in applied sciences. We contribute with advice to lab members for experiment design, downstream validation, and appropriate statistical methods. However, achieving a productive collaborative environment is a twofold task, which requires consistent communication from both sides.