Replicable empirical machine learning research
Abstract: In the absence of mathematical theory addressing complex real-life settings beyond simplifying assumptions, the behavior and performance of machine learning methods often has to be addressed by applying them to real or simulation data and observing what happens. In this sense, methodological machine learning research can be viewed as an empirical science. Are the results published in this field reliable? When authors claim that their (new) method performs better than existing ones, should readers trust them? Is an independent study likely to obtain similar results? The answer to all these questions is probably „not always“. The so-called replication crisis in science has drawn increasing attention across empirical research fields such as medicine or psychological science. What about good practice issues in methodological empirical research – that considers methods as research objects? When developing and evaluating new machine learning methods, do we adhere to good practice principles typically promoted in other fields? I argue that the machine learning community should make substantial efforts to address what may be called the replication crisis in methodological research, in particular by trying to avoid bias in comparison studies based on simulated or real data. I discuss topics such as publication bias, cherry-picking/over-optimism, experimental design and the necessity of neutral comparison studies, and review recent positive developments towards more reliable empirical evidence. Benchmark studies comparing statistical learning methods with a focus on high-dimensional biological data will be used as examples.
Bio: Anne-Laure Boulesteix obtained a diploma in engineering from the Ecole Centrale Paris, a diploma in mathematics from the University of Stuttgart (2001) and a PhD in statistics (2005) from the Ludwig Maximilian University (LMU) of Munich. After a postdoc phase in medical statistics, she joined the Medical School of the University of Munich as a junior professor (2009) and professor (2012). She is working at the interface between biostatistics, machine learning and medicine with a particular focus on metascience and evaluation of methods. She is part of the Munich Center of Machine Learning, steering committee member of the STRATOS initiative, founding member of the LMU Open Science Center and president of the German Region of the International Biometric Society.
Reproducibility and replicability of computer simulations
Abstract: Since the early days of the reproducibility crisis, much progress has been made in understanding and improving computational reproducibility and replicability (R&R). What have we accomplished so far, and what remains to be done? I will concentrate on the state of R&R in computer simulations, i.e. experiments on computational models, leaving aside the additional complications of dealing with observational data.
The questions I will address include: Should computer simulations be made reproducible? Why? At what cost? To the last bit, or on a “good enough” basis? Can we ensure reproducibility without repeating lengthy computations? Is replicability more or less important than reproducibility in scientific practice? How replicable are computer simulations today? What are the obstacles to better replicability?
Bio: Konrad Hinsen is a CNRS researcher at the Centre de Biophysique Moléculaire in Orléans and at the Synchrotron SOLEIL in Saint Aubin (France). His main field of research is computational biophysics, and in particular the structure and dynamics of proteins. A long-standing interest in improving the practices in computational science has lead him to research on scientific computing and to the development of software tools. In 1995, he was a co-founder of the Numerical Python project, which started the Scientific Python ecoystem inside which he then developed tools for molecular simulation. Today he is a contributor to the Guix project, focusing on its use in reproducible computations. He is also a co-author of two MOOCs on reproducible computational research, and a co-founder of the journal “ReScience C” that publishes replication work in computational science.