Figure 3a (page 3) illustrates that the best data processing system choice for PageRank on the Orkut follower graph differs depending on scale. Investing more resources reduces overall makespan, but reduces efficiency.

On the small scale of the Orkut graph, a single-machine system (GraphChi) performs almost as well as a 16-node system (PowerGraph) and only 2-5x better than a 100-node cluster (GraphLINQ/Spark).

Figure 3a

Under construction: we will add information on the experimental setup and our data sets here shortly.

If you are interested in being notified when the data appears, please join our musketeer-announce mailing list.

Thanks for your patience.

-- The Musketeer team.

Experimental setup

This experiment was executed on Amazon EC2. We ran the computation using 1, 16 and 100 instances. Please check the clusters page for more details.

Result data set

The raw results for this experiment are available here.

To plot Figure 3a, run the following command:

experiments/plotting_scripts$ python ../page_rank/ec2/cluster3/stat/pagerank_orkut_naiad_100nodes_baseline_ "GraphLINQ" ../page_rank/ec2/cluster2/stat/pagerank_orkut_spark_baseline_ "Spark" ../page_rank/ec2/cluster1/stat/pagerank_orkut_hadoop_baseline_ "Hadoop" ../page_rank/ec2/cluster1/stat/pagerank_orkut_powergraph_16nodes_baseline_ "PowerGraph" ../page_rank/ec2/cluster3/stat/pagerank_orkut_naiad_16nodes_baseline_ "GraphLINQ" ../page_rank/ec2/cluster1/stat/pagerank_orkut_graphchi_baseline_ "GraphChi"

The graph will be in orkut_makespan.pdf