In this experiment, we quantify the relative overhead of using Musketeer's auto-generated code compared to – hand-optimized baselines (which require significantly higher time investmentment). The mean overhead is always within 30%, and usually 5-20%.
The variance in results (and occasional negative overhead) is due to performance variation across runs on Amazon EC2. The 30% overhead on the generated Hadoop code can be reduced with additional engineering effort.
Under construction: we will add information on the experimental setup and our data sets here shortly.
If you are interested in being notified when the data appears,
please join our
musketeer-announce
mailing list.
Thanks for your patience.
-- The Musketeer team.
This experiment was executed on an Amazon EC2 cluster comprising of 100 instances. Please check the clusters page for more details.
The raw results for this experiment are available here.
To plot Figure 11, run the following command:
experiments/plotting_scripts$ python plot_pagerank_overhead.py /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_hadoop_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_hadoop_musketeer_ "Hadoop" /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_spark_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster4/stat/pagerank_twitter_spark_musketeer_merged_ "Spark" /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster3/stat/pagerank_twitter_naiad_100nodes_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster4/stat/pagerank_twitter_naiad_musketeer_merged_ "Naiad" /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_powergraph_16nodes_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_powergraph_16nodes_musketeer_ "PG" /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_graphchi_baseline_ /local/scratch/icg27/Dropbox/phd/Musketeer/experiments/page_rank/ec2/cluster1/stat/pagerank_twitter_graphchi_musketeer_ "GraphChi"
The graph will be in twitter_overhead.pdf