Toward a common language to facilitate reproducible research and technology transfer: challenges and solutions
Abstract: During the past 10 years, we have considerably improved the reproducibility of experimental results from published papers by introducing the artifact evaluation process with a unified artifact appendix and reproducibility checklists, Jupyter notebooks, containers, and Git repositories. On the other hand, our experience reproducing more than 200 papers shows that it can take weeks and months of painful and repetitive interactions between teams to reproduce artifacts. This effort includes decrypting numerous README files, examining ad-hoc artifacts and containers, and figuring out how to reproduce computational results. Furthermore, snapshot containers pose a challenge to optimize algorithms’ performance, accuracy, power consumption and operational costs across diverse and rapidly evolving software, hardware, and data used in the real world.
In this talk, I will explain how our practical artifact evaluation experience and the feedback from researchers and evaluators motivated us to develop a simple, intuitive, technology agnostic, and English-like scripting language called Collective Mind (CM). It helps to automatically adapt any given experiment to any software, hardware, and data while automatically generating unified README files and synthesizing modular containers with a unified API. It is being developed by MLCommons to facilitate reproducible AI/ML Systems research and minimizing manual and repetitive benchmarking and optimization efforts, reduce time and costs for reproducible research, and simplify technology transfer to production. I will also present several recent use cases of how CM helps MLCommons, the Student Cluster Competition, and artifact evaluation at ACM/IEEE conferences. I will conclude with our development plans, new challenges, possible solutions, and upcoming reproducibility and optimization challenges powered by the MLCommons Collective Knowledge platform and CM: access.cKnowledge.org.
Bio: Grigori Fursin is a co-chair of the MLCommons task force on automation and reproducibility, president of the cTuning foundation, and founder of cKnowledge.org. After completing a Ph.D. in Computer Science from the University of Edinburgh, Grigori was a senior tenured research scientist at INRIA, co-director of the Intel Exascale Lab, founder of the cKnowledge.io platform, and Vice President of MLOps at OctoML. He is a recipient of the ACM CGO'17 Test of Time award, EU HiPEAC technology transfer award, and INRIA award of scientific excellence for the world’s first machine learning based compiler.
Grigori leads the development of an open-source Collective Knowledge platform (MLCommons CK) and Collective Mind language (MLCommons CM) to automate benchmarking, optimization, apple-to-apple comparison and deployment of Pareto-efficient AI and ML applications across any software and hardware stacks from any vendor in a unified and reproducible way. He is the author of the Artifact Evaluation and Reproducibility checklist, co-author of the ACM Artifact Review and Badging methodology, and organizer of more than a dozen of artifact evaluations, reproducibility initiatives and optimization tournaments at ACM and IEEE conferences (cTuning.org/ae). Grigori’s mission is to help researchers validate their ideas in the real world in the fastest and most efficient way.