In this experiment, we quantify the relative overhead of using Musketeer's auto-generated code compared to – hand-optimized baselines (which require significantly higher time investmentment). The mean overhead is always within 30%, and usually 5-20%.

The variance in results (and occasional negative overhead) is due to performance variation across runs on Amazon EC2. The 30% overhead on the generated Hadoop code can be reduced with additional engineering effort.


Figure 11

Under construction: we will add information on the experimental setup and our data sets here shortly.

If you are interested in being notified when the data appears, please join our musketeer-announce mailing list.

Thanks for your patience.

-- The Musketeer team.

Experimental setup

This experiment was executed on an Amazon EC2 cluster comprising of 100 instances. Please check the clusters page for more details.

Result data set

The raw results for this experiment are available here.

To plot Figure 11, run the following command:

experiments/plotting_scripts$ python plot_pagerank_overhead.py /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_hadoop_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_hadoop_musketeer_ "Hadoop" /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_spark_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster4/stat/pagerank_twitter_spark_musketeer_merged_ "Spark" /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster3/stat/pagerank_twitter_naiad_100nodes_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster4/stat/pagerank_twitter_naiad_musketeer_merged_ "Naiad" /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_powergraph_16nodes_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_powergraph_16nodes_musketeer_ "PG" /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_graphchi_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_graphchi_musketeer_ "GraphChi"

The graph will be in twitter_overhead.pdf