Experimental Reproducibility
The goal of establishing reproducibility is to ensure your SIGMOD paper stands as a reliable, referenceable work for future research. The premise is that experimental papers will be most useful when their results have been tested and generalized by objective third parties.
The Review Process
The committee contacts the authors of accepted papers, who can submit experiments for reviews - on a voluntary basis - from April to September 2012. Details about the submission process will be communicated directly to authors. The committee takes the decision to award or not the following labels:
- Reproducible label: the experiments reproduced by the committee support the central results reported in the paper.
- Sharable label: the experiments are made available to the community and they have been tested by the committee - a URL is provided.
How does the committee assess whether the experiments reproduced by the committee support the central results reported in the paper?
To get a reproducible label, a submission must fulfill the following three criteria:
- [depth] - Each submitted experiment contains
- a prototype system provided as a white box (source, configuration files, build environment) or a commercial system fully specificed;
- the set of experiments (system configuration and initialization, scripts, workload, measurement protocol) used to produce the raw experimental data;
- the scripts needed to transform the raw data into the graphs included in the paper.
- [portability] - The results can be reproduced on a different environment (i.e., on a different OS or machine) than the original
development environment.
- [coverage] - Central results and claims from the paper are supported by the submitted experiments.
Some Guidelines
Authors should make it easy for reviewers (and the community at large) to reproduce the central experimental results reported in a paper. Here are some guidelines for authors based on the experience from previous years.
We distinguish two phases in any experimentation: data acquisition and data derivation:
- Primary Data acquisition: Here the issue is how to obtain the raw data upon which the conclusions are drawn. Sometimes the reproducibility committee can simply rerun software (e.g. rerun some existing benchmark). At other times, obtaining raw data may require special hardware (e.g. sensors in the arctic). In the latter case, the committee won't be able to reproduce the acquisition of raw data, but then you should give the committee a protocol with detailled procedures for system set-up, experiment set-up and measurements. When raw data acquisition can be produced. In that case:
- Environment: Authors should explicitely specify the OS and tools that should be installed as the environment. Such specifications should include dependencies with specific hardware features (e.g., 25 GB of RAM are needed) or dependencies within the environment (e.g., the compiler that should be used must be run with a specific version of the operating system). Note that a virtual machine allows authors to distribute open source environments for single site systems.
- System: System setup is the most challenging aspect when repeating an experiment as the system needs to be installed and configured in a new environment before experiments can be run. System set-up will be easier to conduct if it is automatic rather than manual. Authors should test that the system they distribute can actually be installed in a new environment. The documentation should detail every step in system set-up:
- How to obtain the system?
- How to configure the environment if need be (e.g., environment variables, paths)?
- How to compile the system? (existing compilation options should be commented)
- How to use the system? (what are the configuration options and parameters to the system?)
- How to make sure that the system is installed correctly?
Note that a virtual machine allows authors to distribute the system already installed in its environment.
- Experiments: Given a system, authors provide a set of experiments to reproduce the paper's results. Typically, each experiment consits of a set-up phase (where parameters are configured and data is loaded), a running phase (where a workload is applied and measurements are taken), and a clean-up phase (where the system is prepared to avoid interference with the next round of experiments). Authors should document (i) how to perform the set-up, running and clean-up phases, and (ii) how to check that these phases completed as they should. Authors should document the expected effect of the set-up phase (e.g., a cold file cache is enforced) and the different steps of the running phase (e.g., by documenting the combination of command line options used to run a given experiment script). Obviously, experiments will be easier to reproduce if they are automatic (e.g., a script that takes a range of values for each experiment parameter as arguments) rather than manual (e.g., a script that must be edited so that a constant takes the value of a given experiment parameter).
- Data Derivation: For each graph in the paper, the author should describe how each graph is obtained from the experimental measurements. Ideally, the authors release the scripts (or spreadsheets) that are used to compute derivations (typically statistics) and generate the graphs.
The experiments published by Jens Teubner and Rene Mueller, from ETH Zurich, together with their SIGMOD'11 article on "How Soccer Players Would Do Stream Joins" is an excellent illustration of these guidelines.
Reproducibility Committee
Philippe Bonnet, ITU, Denmark, chair
Juliana Freire, NYU, USA, chair
Radu Stoica, EPFL, Switzerland
Wei Cao, Remnin U, China
Willis Lang, U.Wisc, USA
David Koop, U.Utah, USA
Mian Lu, HKUST, China
Martin Kaufmann, ETHZ, Switzerland
Ben Sowell, Cornell, USA
Dimitris Tsirogiannis, Microsoft, USA
Lucja Kot, Cornell, USA
Dan Olteanu, Oxford, UK
Stratos Idreos, CWI, Netherlands
Ryan Johnson, U.Toronto, Canada
Matias Bjørling, ITU, Denmark
Paolo Papotti, U.Roma 3, Italy
Eli Cortez, Universidade Federal do Amazonas, Brazil