Distributed Computing
What is Distributed Computing?
Distributed Computing involves the breaking down a computational problem into several parallel tasks to be completed by two or more computers in a network which form a distributed system. This combines the computational power of several computers to solve large problems which involve the processing of large data or require a huge number of iterations.
Currently, there are several ongoing large-scale Distributed Computing projects spanning various fields which allow computers from all over the world to participate in, many of which have been running for years. This shows how large computational problems these days can be!
Here are some examples:
-
Great Internet Mersenne Prime Search (Math)
Search for Mersenne Prime numbers, which are prime numbers that are in the form 2n - 1, or have only '1's in binary.
-
Test4Theory (Physics)
Search for new fundamental particles at CERN's Large Hadron Collider.
-
Folding@Home (Molecular Biology)
Run simulations on the folding of proteins, which aid the study of diseases such as Alzheimer's, Huntington's, and many cancers.
-
QMC@Home (Chemistry)
Study the structure and reactivity of molecules using quantum chemistry and Monte Carlo techniques.
-
MindModelling@Home (Cognitive Science)
Use computational cognitive process modelling to better understand the mind.
-
MilkyWay@Home (Astronomy)
Create a highly accurate 3D model of the Milky Way galaxy.
-
Enigma@Home (Cryptography)
Break three original Enigma message which have not been broken.
Demonstrating distributed computing using 2 Pis
For demonstration purposes, I shall connect 2 Raspberry Pis using an Ethernet cable and perform a simple merge sort on a large array of elements.
Firstly, here's the code for a merge sort algorithm written in python.
import time
def merge(left,right): #merges 2 sorted lists together
result = []
i, j = 0, 0
#Goes through both lists
while i < len(left) and j < len(right):
#Adds smaller element of the lists to the final list
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result += left[i:]
result += right[j:]
return result
def mergesort(lst):
#if there's only 1 element, no need to sort
if len(lst) < 2:
return lst
#breaks down list into 2 halves
middle = len(lst) / 2
#recursively splits and sorts each half
left = mergesort(lst[:middle])
right = mergesort(lst[middle:])
#merges both sorted lists together
return merge(left, right)
Next, here's the code for using the merge sort algorithm to sort an array of 100000 elements using 1 Raspberry Pi.
(This code will be used in the following programs as well so have them in the same directory before running them!)
import MergeSort #Imports mergesort functions
import random
import time
#Create an array to be sorted
arraylength = 100000 #Length of array to be sorted
print 'Length of array is', arraylength
array = range(arraylength) #Creates array
random.shuffle(array) #Jumbles up array
#Sort and time sorting process
start_time = time.time() #Records start time
print 'Sorting array...'
array = MergeSort.mergesort(array) #Sorts array
print 'Array sorted.'
time_taken = time.time() - start_time #Calculates and records time_taken
print 'Time taken to sort is ', time_taken, 'seconds.'
With only one Raspberry Pi performing the task, it takes about 24 seconds to complete the task.
Now, let's distribute the task to 2 Raspberry Pis but to do so we first have to set the IP addresses of each Raspberry Pi.
For the first Pi, type this in the command line, configuring its IP address to 192.168.1.1, and this Pi will act as the Server in the network.
sudo ifconfig eth0 192.168.1.1 broadcast 192.168.1.255 netmask 255.255.255.0
Similarly, type the following for the second Pi, configuring its IP address to 192.168.1.2, and this Pi will act as the Client.
sudo ifconfig eth0 192.168.1.2 broadcast 192.168.1.255 netmask 255.255.255.0
For the first Pi, run the following code.
import socket
import MergeSort #Imports mergesort functions
import random
import time
#breaks down array into n sections where n is the number of processors
def breakarray(array, n):
sectionlength = len(array)/n #length of each section
result = []
for i in range(n):
if i < n - 1:
result.append( array[ i * sectionlength : (i+1) * sectionlength ] )
#include all remaining elements for the last section
else:
result.append( array[ i * sectionlength : ] )
return result
#Create an array to be sorted
arraylength = 100000 #Length of array to be sorted
print 'Length of array is', arraylength
array = range(arraylength) #Creates array
random.shuffle(array) #Jumbles up array
#Specify info on processors/computers
procno = 2 #number of processors
print 'Number of processors:', procno
procID = 0 #ID of this processor(server)
addr_list = [] #list of client addresses
#Sets up network
HOST = ''
PORT = 50007
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST, PORT))
s.listen(procno - 1) #Listens for (n) number of client connections
print 'Waiting for client...'
for i in range(procno - 1): #Connects to all clients
conn, addr = s.accept() #Accepts connection from client
print 'Connected by', addr
addr_list.append(addr) #Adds address to address list
#Start and time distributed computing sorting process
start_time = time.time() #Records start time
sections = breakarray(array, procno) #splits array into sections for every client
for i in range(procno - 1): #Converts array section into string to be sent
arraystring = repr(sections[i+1])
conn.sendto( arraystring , addr_list[i] ) #Sends array string
print 'Data sent, sorting array...'
array = MergeSort.mergesort(sections[procID]) #Sorts section and stores it in array
print 'Array sorted.'
for i in range(procno - 1): #Receives sorted sections from each client
arraystring = ''
print 'Receiving data from clients...'
while 1:
data = conn.recv(4096) #Receives data in chunks
arraystring += data #Adds data to array string
if ']' in data: #When end of data is received
break
print 'Data received, merging arrays...'
array = MergeSort.merge(array, eval(arraystring)) #Merges current array with section from client
print 'Arrays merged.'
conn.close()
time_taken = time.time() - start_time #Calculates and records time_taken
print 'Time taken to sort is ', time_taken, 'seconds.'
When the line "Waiting for client..." is printed on the first Pi's command line, run the following code on the second Pi.
import socket
import MergeSort
HOST = '192.168.1.1'
PORT = 50007
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
#Receives arraystring in chunks
arraystring = ''
print 'Receiving data...'
while 1:
data = s.recv(4096) #Receives data in chunks
#print data
arraystring += data #Adds data to array string
if ']' in data: #When end of data is received
break
array = eval(arraystring)
print 'Data received, sorting array... '
#Sorts the array which it is allocated
array = MergeSort.mergesort(array)
print 'Array sorted, sending data...'
#Converts array into string to be sent back to server
arraystring = repr(array)
s.sendall(arraystring) #Sends array string
print 'Data sent.'
s.close()
The time taken to sort the array has decreased to about 16 seconds, which is not a 2-fold decrease due to the overhead in processing and transferring of the data between the 2 Pis. This increase in speedup will be more prominent with the use of more Raspberry Pis by connecting them via a hub.
What now?
Hopefully this really short tutorial gives you a glimpse of what distributed computing is about and how to implement it simply. As exemplified there are many applications of this and perhaps you could start your own distributed computing project using your Raspberry Pi (and a friend's or friends')!