Embracing Computational Reproducibility: Challenges, Solutions, and Cultivating Trust in Data-Driven Science

Abstract: The abundance of data, accessible computing power and storage has revolutionized science and ushered an era of data-driven scientific discoveries. However, this paradigm shift has raised critical questions about how to adapt the scientific process to ensure transparency and reproducibility in the era of data and computation.

In this talk, I will delve into the challenges involved in capturing and managing computational provenance, and examine the evolution of methods and tools that have been proposed to facilitate transparency and reproducibility. Although significant progress has been made in this domain, achieving widespread adoption of reproducibility best practices remains a persistent challenge in scientific research.

To establish computational reproducibility as the norm, I advocatefor a comprehensive approach that encompasses three key elements: the development of cyberinfrastructure that seamlessly integrates reproducibility as an essential component; education to instill reproducibility principles within the scientific community; and incentives that reward reproducible research practices. Ultimately, I argue that reproducibility should not be viewed as an isolated objective but rather as a means to empower experts to debug, explain, and build trust in the insights they derive from their research. By embracing computational reproducibility as an integral part of the scientific process, we can drive scientific progress, enhance credibility, and leverage the transformative potential of data-driven research.

Bio: Juliana Freire is a Professor of Computer Science and Data Science at New York University and co-directs the Visualization Imaging and Data Analysis Center (VIDA) at the Tandon School of Engineering. She was the elected chair of the ACM Special Interest Group on Management of Data (SIGMOD), served as a council member of the Computing Research Association’s Computing Community Consortium (CCC), was the NYU lead investigator for the Moore-Sloan Data Science Environment, and served as a member of the National Academies Committee on Reproducibility and Replicability in Science. She develops methods and systems that enable a wide range of users to obtain trustworthy insights from data. This spans topics in large-scale data analysis and integration, visualization, machine learning, provenance management, and web information discovery, and different application areas, including urban analytics, predictive modeling, and computational reproducibility. Freire has co-authored over 200 technical papers (including 11 award-winning publications), several open-source systems, and is an inventor of 12 U.S. patents. She is a AAAS Fellow, an ACM Fellow, and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She received the ACM SIGMOD Contributions Award in 2020. Her research has been funded by the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T Research, Microsoft Research, Yahoo! and IBM. She received a B.S. degree in computer science from the Federal University of Ceara (Brazil), and M.Sc. and Ph.D. degrees in computer science from the State University of New York at Stony Brook.