Reproducibility and Provenance in Data Science
19th July 2018
0900 - Keynote - How I learned to stop worrying and love version control - Dr Stephen J Newhouse and Luke Marsden
1000 - Effective computing for research reproducibility - Dr Laura Fortunato
1100 - A crazy little thing called reproducible science - Dr Tania Allard
1200 - TBC
1400 - 1730 Version Control for your Model, Data and Environment. Docker for Data Science (Workshop)
In the afternoon workshop you’ll follow a set of hands-on self-paced exercises with support from our facilitators. You will leave being able to use tools like Docker, Git and dotscience to ensure the provenance and reproducibility of your models, environments and data.
Although all levels are encouraged to join in, some familiarity with working on the command line will be advantageous. You will also be required to bring your own laptop.
Stephen J Newhouse
Stephen studied Molecular Biology at The University of Liverpool then went on to complete a Ph.D. in Genetics at Queen Mary University of London. Currently, he’s employed as Lead Data Scientist and Senior Bioinformatician at The Bioinformatics Core at the NIHR Biomedical Research Centre for Mental Health, Kings College London.
His work has included dealing with all kinds of data including molecular, genetic and clinical, cross-sectional and time-series to identify 1) potential biomarkers for disease prediction and progression, 2) novel therapeutic targets, and 3) contribute to a better understanding of human disease.
He has a wide range of experience in the analysis of expression and SNP arrays, next generation sequencing data and network/pathway analyses. His work has required the extensive use of multiple bioinformatic and biostatistical approaches for the creation and implementation of pipelines for mixed -omic data analysis, data integration, and visualization, and applied predictive modelling.
Luke is the CEO and Founder at his new venture, Dotmesh. He is also a Kubernetes SIG lead for SIG-cluster-lifecycle, where he was involved in developing the first version of kubeadm.
He previously worked on Developer Experience at Weaveworks, where he spoke and taught at conferences, meetups and trainings on cloud native topics such as container networking, monitoring with Prometheus, continuous delivery and OpenTracing. Before that he was the CTO and Founder at ClusterHQ, where he got involved right at the start of the Docker and Kubernetes journey, collaborating closely with Docker and others to develop the first Docker volume plugin mechanism and build the first implementation of container persistence, Flocker.
Laura’s research aims to understand the evolution of human social and cultural behaviour, working at the interface of anthropology and biology. Three areas of ongoing research are the evolution of kinship and marriage systems, cultural evolution, and the evolution of cooperation and social complexity. Since joining Oxford in 2013, she has incorporated training in effective computing for reproducibility into graduate teaching and supervision. She leads the Reproducible Research Oxford project, which she set up in 2016 with the aim to extend such training to students, researchers, and staff across the University.
Laura studied Biological Sciences at the University of Padova (Laurea, 2003) and Anthropology at University College London (MRes, 2004; PhD, 2009). Between 2010 and 2013 she held an Omidyar Fellowship at the Santa Fe Institute, where she is currently an External Professor. Since 2013 she has been based at the University of Oxford, as Associate Professor of Evolutionary Anthropology and Tutorial Fellow in Evolutionary Anthropology at Magdalen College, Oxford.
Tania works as a Research Software Engineer at the University of Leeds, where she acts as a consultant in data engineering and reproducibility in (data) science. She works with a number of groups across all scientific disciplines to help them make better use of their data. She also consults research groups and SMEs on how to make the most of their data and to build robust data analysis pipelines. Her main focus is complex analysis workflows, scalable data science, and reproducibility and replicability in computationally intensive areas. She is passionate about mentoring, open source, and its community and is involved in a number of initiatives aimed to build more diverse and inclusive communities.
Get your free ticket
Code of Conduct
Please note that by attending the conference you agree to the following code of conduct .