2014 Linux Symposium, July 14-16

DataMill: Rigorous Performance Evaluation Made Easy

Augusto Oliveira (a3olivei@uwaterloo.ca)

Did you know that minor aspects of your experimental setup can have a significant impact on your performance measurements and potentially invalidate conclusions you draw from them? Examples of such influences, often called hidden factors, include binary link order, POSIX environment size, randomly generated symbol names, and Linux scheduling group assignments, and can affect your performance by more than 10%. The growth in complexity and size of modern computer systems will further aggravate this dilemma, especially with the given the pressure of producing results quickly. So how can one trust empirical analysis of a new idea or concept in computer performance? That is where DataMill (http://datamill.uwaterloo.ca) comes in!

DataMill is a Linux-based, easy-to-use benchmarking infrastructure that can help you produce robust, reliable, and reproducible performance results that is already used in several universities around the world. You can submit your experiment for execution on DataMill and receive thousands of data points that cover a wide array of hardware platforms and software setups. This data allows you to reach much wider and more robust conclusions than you could with data from a single machine and software setup. This tutorial will cover everything you need to know to use DataMill: from packaging experiments, to selecting an experiment design, to plotting and analyzing results, to contributing worker nodes in order to access the site.

Policies   |   Media Archives