Figure 3a (page 3) illustrates that the best data processing system choice for PageRank on the Orkut follower graph differs depending on scale. Investing more resources reduces overall makespan, but reduces efficiency.
On the small scale of the Orkut graph, a single-machine system (GraphChi) performs almost as well as a 16-node system (PowerGraph) and only 2-5x better than a 100-node cluster (GraphLINQ/Spark).
Under construction: we will add information on the experimental setup and our data sets here shortly.
If you are interested in being notified when the data appears,
please join our
musketeer-announce mailing list.
Thanks for your patience.
-- The Musketeer team.
This experiment was executed on Amazon EC2. We ran the computation using 1, 16 and 100 instances. Please check the clusters page for more details.
The raw results for this experiment are available here.
To plot Figure 3a, run the following command:
experiments/plotting_scripts$ python plot_pagerank_nobreakdown_motivation.py ../page_rank/ec2/cluster3/stat/pagerank_orkut_naiad_100nodes_baseline_ "GraphLINQ" ../page_rank/ec2/cluster2/stat/pagerank_orkut_spark_baseline_ "Spark" ../page_rank/ec2/cluster1/stat/pagerank_orkut_hadoop_baseline_ "Hadoop" ../page_rank/ec2/cluster1/stat/pagerank_orkut_powergraph_16nodes_baseline_ "PowerGraph" ../page_rank/ec2/cluster3/stat/pagerank_orkut_naiad_16nodes_baseline_ "GraphLINQ" ../page_rank/ec2/cluster1/stat/pagerank_orkut_graphchi_baseline_ "GraphChi"
The graph will be in