SUPERVISION 2 1) Read this paper: research.google.com/archive/chubby-osdi06.pdf Summarise the main contributions of the work described in this paper in 300 words. 2) DNS is a good example of a large-scale distributed system. (a) Find out how DNS works and write a 300 word summary. (b) Provide an estimate for the number of DNS lookups which occur on the Internet in a typical day. How many queries do you think a single 4-core server could support? (c) Describe why the architecture of DNS can support the rate of lookups you estimated in part (b). (d) Does DNS support strong or weak consistency? Why? How might you demonstrate this one way or another? 3) You have been given a cluster of standard PCs all connected via Ethernet and the raw location data from 20 million vehicles in the UK for the last full five calendar years. Each location sample of a vehicle is stored as four 32-bit values representing: , , , A location sample is captured every second the vehicle's engine is running. You have been asked to use the Map-Reduce framework to generate summary statistics of the data. (a) Estimate the total number of samples in the data set. (b) Describe how you would divide up the raw data and distributed it across the hard drives in the PCs. (c) Describe how you would use the Map-Reduce framework to distribute the processing required to calculate the following statistics: (i) The vehicle which drove the most miles (ii) The 20 most frequently visited locations in the UK (iii) The minimum, average and maximum speed on each 1km stretch of UK motorway for each hour of each day in the week (d) How many PCs will you need in the cluster to calculate the statistics in a timely fashion? 5) Describe what role-based access control is. As the managing director of a large bank, describe how you could use role-based access control to manage the risk posed by corrupt employees.