Replicable empirical machine learning research
Abstract: In the absence of mathematical theory addressing complex real-life settings beyond simplifying assumptions, the behavior and performance of machine learning methods often has to be addressed by applying them to real or simulation data and observing what happens. In this sense, methodological machine learning research can be viewed as an empirical science. Are the results published in this field reliable? When authors claim that their (new) method performs better than existing ones, should readers trust them? Is an independent study likely to obtain similar results? The answer to all these questions is probably „not always“. The so-called replication crisis in science has drawn increasing attention across empirical research fields such as medicine or psychological science. What about good practice issues in methodological empirical research – that considers methods as research objects? When developing and evaluating new machine learning methods, do we adhere to good practice principles typically promoted in other fields? I argue that the machine learning community should make substantial efforts to address what may be called the replication crisis in methodological research, in particular by trying to avoid bias in comparison studies based on simulated or real data. I discuss topics such as publication bias, cherry-picking/over-optimism, experimental design and the necessity of neutral comparison studies, and review recent positive developments towards more reliable empirical evidence. Benchmark studies comparing statistical learning methods with a focus on high-dimensional biological data will be used as examples.
Bio: Anne-Laure Boulesteix obtained a diploma in engineering from the Ecole Centrale Paris, a diploma in mathematics from the University of Stuttgart (2001) and a PhD in statistics (2005) from the Ludwig Maximilian University (LMU) of Munich. After a postdoc phase in medical statistics, she joined the Medical School of the University of Munich as a junior professor (2009) and professor (2012). She is working at the interface between biostatistics, machine learning and medicine with a particular focus on metascience and evaluation of methods. She is part of the Munich Center of Machine Learning, steering committee member of the STRATOS initiative, founding member of the LMU Open Science Center and president of the German Region of the International Biometric Society.