Figure 9 (page 11) illustrates that certain workflows work best on a combination of different data processing systems. Specifically, we show the runtime of a cross-community PageRank (the intersection of two communities followed by a PageRank of users present in both). Musketeer makes it easy to generate code combining systems in this way.

Combined systems (middle cluster) are on-par with the best single-system implementation (Lindi, top cluster). Moreover, a combination of two different Naiad front-ends (Lindi and GraphLINQ) performs best (2.5x better than a Lindi-only implementation).


Figure 9

Under construction: we will add information on the experimental setup and our data sets here shortly.

If you are interested in being notified when the data appears, please join our musketeer-announce mailing list.

Thanks for your patience.

-- The Musketeer team.


Experimental setup

This experiment was executed on our small dedicated cluster of seven machines.

Result data set

The raw results for this experiment are available here.

To plot Figure 9, run the following command:

experiments/plotting_scripts$ python plot_comm-pagerank_barchart.py ../community_page_rank/com_rank.csv HadoopMerged "Hadoop only" SparkMerged "Spark only" MNaiad "Lindi only" HadoopSpark "Hadoop \& Spark" HadoopGraphLabBS2m "Hadoop \& PowerGraph" HadoopChiBSFree "Hadoop \& GraphChi" LindiGraphLinq "Lindi \& GraphLINQ" cross_comm_pr_makespan.pdf

The graph will be in cross_comm_pr_makespan.pdf