Reproducing Performance - The Good, the Bad, and the Ugly
Abstract: Containers and Jupyter notebooks are useful tools for reproducing computational results of any packaged application. However, if the execution performance or efficiency is the science result, matters are more complex. It may not be sufficient to package codes in containers. In fact, containers may disturb the performance results and reproducibility. We outline a set of techniques to facilitate performance reproducibility in various settings. Some performance results may be linked to specific computer architectures or even specific system configurations that may not be accessible to other researchers or even the original team after a software update. We outline techniques to help researchers interpret results on the original system even if it is practically impossible to reproduce the original results. We discuss such techniques both in the context of pure performance but also in the context of the emerging field of data science and artificial intelligence that often allows for a performance-accuracy tradeoff. All-in-all, our work provides a set of guidelines to follow to support reproducible science of performance and benchmarking.
Bio: Torsten Hoefler is a Professor of Computer Science at ETH Zurich, a member of Academia Europaea, and a Fellow of the ACM and IEEE. Following a “Performance as a Science” vision, he combines mathematical models of architectures and applications to design optimized computing systems. Before joining ETH Zurich, he led the performance modeling and simulation efforts for the first sustained Petascale supercomputer, Blue Waters, at the University of Illinois at Urbana-Champaign. He is also a key contributor to the Message Passing Interface (MPI) standard where he chaired the “Collective Operations and Topologies” working group. Torsten won best paper awards at ACM/IEEE Supercomputing in 2010, 2013, 2014, 2019, 2022, and at other international conferences. He has published numerous peer-reviewed scientific articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. For his work, Torsten received the IEEE CS Sidney Fernbach Memorial Award in 2022, the ACM Gordon Bell Prize in 2019, the IEEE TCSC Award of Excellence (MCR), ETH Zurich’s Latsis Prize, the SIAM SIAG/Supercomputing Junior Scientist Prize, the IEEE TCSC Young Achievers in Scalable Computing Award, and the BenchCouncil Rising Star Award. Following his Ph.D., he received the 2014 Young Alumni Award and the 2022 Distinguished Alumni Award of his alma mater, Indiana University. Torsten was elected to the first steering committee of ACM’s SIGHPC in 2013 and he was re-elected for every term since then. He was the first European to receive many of those honors; he also received both an ERC Starting and Consolidator grant. His research interests revolve around the central topic of performance-centric system design and include scalable networks, parallel programming techniques, and performance modeling for large-scale simulations and artificial intelligence systems. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.
Embracing Computational Reproducibility: Challenges, Solutions, and Cultivating Trust in Data-Driven Science
Abstract: The abundance of data, accessible computing power and storage has revolutionized science and ushered an era of data-driven scientific discoveries. However, this paradigm shift has raised critical questions about how to adapt the scientific process to ensure transparency and reproducibility in the era of data and computation.
In this talk, I will delve into the challenges involved in capturing and managing computational provenance, and examine the evolution of methods and tools that have been proposed to facilitate transparency and reproducibility. Although significant progress has been made in this domain, achieving widespread adoption of reproducibility best practices remains a persistent challenge in scientific research.
To establish computational reproducibility as the norm, I advocatefor a comprehensive approach that encompasses three key elements: the development of cyberinfrastructure that seamlessly integrates reproducibility as an essential component; education to instill reproducibility principles within the scientific community; and incentives that reward reproducible research practices. Ultimately, I argue that reproducibility should not be viewed as an isolated objective but rather as a means to empower experts to debug, explain, and build trust in the insights they derive from their research. By embracing computational reproducibility as an integral part of the scientific process, we can drive scientific progress, enhance credibility, and leverage the transformative potential of data-driven research.
Bio: Juliana Freire is a Professor of Computer Science and Data Science at New York University and co-directs the Visualization Imaging and Data Analysis Center (VIDA) at the Tandon School of Engineering. She was the elected chair of the ACM Special Interest Group on Management of Data (SIGMOD), served as a council member of the Computing Research Association’s Computing Community Consortium (CCC), was the NYU lead investigator for the Moore-Sloan Data Science Environment, and served as a member of the National Academies Committee on Reproducibility and Replicability in Science. She develops methods and systems that enable a wide range of users to obtain trustworthy insights from data. This spans topics in large-scale data analysis and integration, visualization, machine learning, provenance management, and web information discovery, and different application areas, including urban analytics, predictive modeling, and computational reproducibility. Freire has co-authored over 200 technical papers (including 11 award-winning publications), several open-source systems, and is an inventor of 12 U.S. patents. She is a AAAS Fellow, an ACM Fellow, and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She received the ACM SIGMOD Contributions Award in 2020. Her research has been funded by the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T Research, Microsoft Research, Yahoo! and IBM. She received a B.S. degree in computer science from the Federal University of Ceara (Brazil), and M.Sc. and Ph.D. degrees in computer science from the State University of New York at Stony Brook.
Toward a common language to facilitate reproducible research and technology transfer: challenges and solutions
Abstract: During the past 10 years, we have considerably improved the reproducibility of experimental results from published papers by introducing the artifact evaluation process with a unified artifact appendix and reproducibility checklists, Jupyter notebooks, containers, and Git repositories. On the other hand, our experience reproducing more than 200 papers shows that it can take weeks and months of painful and repetitive interactions between teams to reproduce artifacts. This effort includes decrypting numerous README files, examining ad-hoc artifacts and containers, and figuring out how to reproduce computational results. Furthermore, snapshot containers pose a challenge to optimize algorithms’ performance, accuracy, power consumption and operational costs across diverse and rapidly evolving software, hardware, and data used in the real world.
In this talk, I will explain how our practical artifact evaluation experience and the feedback from researchers and evaluators motivated us to develop a simple, intuitive, technology agnostic, and English-like scripting language called Collective Mind (CM). It helps to automatically adapt any given experiment to any software, hardware, and data while automatically generating unified README files and synthesizing modular containers with a unified API. It is being developed by MLCommons to facilitate reproducible AI/ML Systems research and minimizing manual and repetitive benchmarking and optimization efforts, reduce time and costs for reproducible research, and simplify technology transfer to production. I will also present several recent use cases of how CM helps MLCommons, the Student Cluster Competition, and artifact evaluation at ACM/IEEE conferences. I will conclude with our development plans, new challenges, possible solutions, and upcoming reproducibility and optimization challenges powered by the MLCommons Collective Knowledge platform and CM: access.cKnowledge.org.
Bio: Grigori Fursin is a co-chair of the MLCommons task force on automation and reproducibility, president of the cTuning foundation, and founder of cKnowledge.org. After completing a Ph.D. in Computer Science from the University of Edinburgh, Grigori was a senior tenured research scientist at INRIA, co-director of the Intel Exascale Lab, founder of the cKnowledge.io platform, and Vice President of MLOps at OctoML. He is a recipient of the ACM CGO'17 Test of Time award, EU HiPEAC technology transfer award, and INRIA award of scientific excellence for the world’s first machine learning based compiler.
Grigori leads the development of an open-source Collective Knowledge platform (MLCommons CK) and Collective Mind language (MLCommons CM) to automate benchmarking, optimization, apple-to-apple comparison and deployment of Pareto-efficient AI and ML applications across any software and hardware stacks from any vendor in a unified and reproducible way. He is the author of the Artifact Evaluation and Reproducibility checklist, co-author of the ACM Artifact Review and Badging methodology, and organizer of more than a dozen of artifact evaluations, reproducibility initiatives and optimization tournaments at ACM and IEEE conferences (cTuning.org/ae). Grigori’s mission is to help researchers validate their ideas in the real world in the fastest and most efficient way.