# L41: Lab 3 - Micro-Architectural implications of IPC

The goals of this lab are to:

- Introduce hardware performance counters (hwpmc)
- Explore micro-architectural implications of IPC
- Gather additional data to support the writing of your first assessed lab report

You will do this by applying PMC to analyse the behaviour of the same potted, kernel-intensive IPC benchmark used in the last lab.

## Running the benchmark

As before, you can run the benchmark using the ipc-static and ipc-dynamic commands, specifying various benchmark parameters. When the new performance-counter argument is used, additional information will be printed about the processor-level behaviour of the IPC loop. Do ensure that, as in Lab 2, you have increased the kernel’s maximum socket-buffer size.

### Example benchmark commands

This command instructs the IPC benchmark to capture information on memory instructions issued when operating on a socket with a 512-byte buffer from a single thread:

In [None]:
# Example benchmark command
print_header("Capturing info on memory instructions")

!ipc/ipc-static -i local -b 512 -P mem 1thread
    
print_footer("Completed")

With some careful processing the output of the IPC benchmark can be manipulated in Python, as shown below (this would of course be easier if the benchmarks provided JSON output!):

## Exploratory questions

- How do requested memory accesses vary across our six benchmark configurations (IPC type × operational mode)?

In [None]:
import re

# Buffer sizes to compute the performance with
# Note: Perprocess resource limits prevent very large buffers with -i socket -s
BUFFER_SIZES = [512 * 2 ** exp for exp in range(0, 16)]

# Number of trials for each buffer size
NUM_TRIALS = 1

ipc_pipe_mem_read_values = []
ipc_pipe_mem_write_values = []

ipc_socket_default_mem_read_values = []
ipc_socket_default_mem_write_values = []

ipc_socket_mem_read_values = []
ipc_socket_mem_write_values = []

print_header("Capturing info on memory instructions")

for buffer_size in BUFFER_SIZES:
    print "Measuring performance for buffer size = {} bytes (max = {}) ".format(buffer_size, BUFFER_SIZES[-1])
    for trial in range(0, NUM_TRIALS):
        # -i pipe
        output = !ipc/ipc-static -i pipe -b {str(buffer_size)} -P mem 2thread
        
        # Convert the PMC output into JSON, to simplify post-processing
        output_json = json.loads(re.sub(r'([a-zA-Z_/0-9.]+)', r'"\1"',
            "{" + ','.join(str(e) for e in output[:-2] if e) +"}"))

        ipc_pipe_mem_read_values.append(output_json["MEM_READ"])
        ipc_pipe_mem_write_values.append(output_json["MEM_WRITE"]) 
        
        # -i local
        output = !ipc/ipc-static -i local -b {str(buffer_size)} -P mem 2thread
             
        # Convert the PMC output into JSON, to simplify post-processing
        output_json = json.loads(re.sub(r'([a-zA-Z_/0-9.]+)', r'"\1"',
            "{" + ','.join(str(e) for e in output[:-2] if e) +"}"))

        ipc_socket_default_mem_read_values.append(output_json["MEM_READ"])
        ipc_socket_default_mem_write_values.append(output_json["MEM_WRITE"]) 

        # -i local -s
        output = !ipc/ipc-static -i local -s -b {str(buffer_size)} -P mem 2thread
           
        # Convert the PMC output into JSON, to simplify post-processing
        output_json = json.loads(re.sub(r'([a-zA-Z_/0-9.]+)', r'"\1"',
            "{" + ','.join(str(e) for e in output[:-2] if e) +"}"))

        ipc_socket_mem_read_values.append(output_json["MEM_READ"])
        ipc_socket_mem_write_values.append(output_json["MEM_WRITE"]) 
        
print_footer("Completed")

In [None]:
%matplotlib inline

# Reshape the list into arrays of size [len(BUFFER_SIZES), NUM_TRIALS]
ipc_pipe_mem_read = np.reshape(ipc_pipe_mem_read_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]
ipc_pipe_mem_write = np.reshape(ipc_pipe_mem_write_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]

ipc_socket_default_mem_read = np.reshape(ipc_socket_default_mem_read_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]
ipc_socket_default_mem_write = np.reshape(ipc_socket_default_mem_write_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]

ipc_socket_mem_read = np.reshape(ipc_socket_mem_read_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]
ipc_socket_mem_write = np.reshape(ipc_socket_mem_write_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]

# Convert the array of io bandwidth values into a Panda DataFrame
# this allows ploting of the median value and computation of the 
# error bars (25 and 75 percentile values)
# Note: The error bars should be small indicating that the experiment is tightly controlled
ipc_pipe_mem_read_df = pd.DataFrame(ipc_pipe_mem_read, index=BUFFER_SIZES)
ipc_pipe_mem_write_df = pd.DataFrame(ipc_pipe_mem_write, index=BUFFER_SIZES)

ipc_socket_default_mem_read_df = pd.DataFrame(ipc_socket_default_mem_read, index=BUFFER_SIZES)
ipc_socket_default_mem_write_df = pd.DataFrame(ipc_socket_default_mem_write, index=BUFFER_SIZES)

ipc_socket_mem_read_df = pd.DataFrame(ipc_socket_mem_read, index=BUFFER_SIZES)
ipc_socket_mem_write_df = pd.DataFrame(ipc_socket_mem_write, index=BUFFER_SIZES)

# Compute error bars based on the 25 and 75 quartile values
ipc_pipe_mem_read_error_bars = ipc_pipe_mem_read_df.quantile([.25, .75], axis=1)
ipc_pipe_mem_read_error_bars.loc[[0.25]] = ipc_pipe_mem_read_df.median(1) - ipc_pipe_mem_read_error_bars.loc[[0.25]]
ipc_pipe_mem_read_error_bars.loc[[0.75]] = ipc_pipe_mem_read_error_bars.loc[[0.75]] - ipc_pipe_mem_read_df.median(1)
ipc_pipe_mem_read_error_bars_values = [ipc_pipe_mem_read_error_bars.values]

ipc_pipe_mem_write_error_bars = ipc_pipe_mem_write_df.quantile([.25, .75], axis=1)
ipc_pipe_mem_write_error_bars.loc[[0.25]] = ipc_pipe_mem_write_df.median(1) - ipc_pipe_mem_write_error_bars.loc[[0.25]]
ipc_pipe_mem_write_error_bars.loc[[0.75]] = ipc_pipe_mem_write_error_bars.loc[[0.75]] - ipc_pipe_mem_write_df.median(1)
ipc_pipe_mem_write_error_bars_values = [ipc_pipe_mem_write_error_bars.values]

ipc_socket_default_mem_read_error_bars = ipc_socket_default_mem_read_df.quantile([.25, .75], axis=1)
ipc_socket_default_mem_read_error_bars.loc[[0.25]] = ipc_socket_default_mem_read_df.median(1) - ipc_socket_default_mem_read_error_bars.loc[[0.25]]
ipc_socket_default_mem_read_error_bars.loc[[0.75]] = ipc_socket_default_mem_read_error_bars.loc[[0.75]] - ipc_socket_default_mem_read_df.median(1)
ipc_socket_default_mem_read_error_bars_values = [ipc_socket_default_mem_read_error_bars.values]

ipc_socket_default_mem_write_error_bars = ipc_socket_default_mem_write_df.quantile([.25, .75], axis=1)
ipc_socket_default_mem_write_error_bars.loc[[0.25]] = ipc_socket_default_mem_write_df.median(1) - ipc_socket_default_mem_write_error_bars.loc[[0.25]]
ipc_socket_default_mem_write_error_bars.loc[[0.75]] = ipc_socket_default_mem_write_error_bars.loc[[0.75]] - ipc_socket_default_mem_write_df.median(1)
ipc_socket_default_mem_write_error_bars_values = [ipc_socket_default_mem_write_error_bars.values]

ipc_socket_mem_read_error_bars = ipc_socket_mem_read_df.quantile([.25, .75], axis=1)
ipc_socket_mem_read_error_bars.loc[[0.25]] = ipc_socket_mem_read_df.median(1) - ipc_socket_mem_read_error_bars.loc[[0.25]]
ipc_socket_mem_read_error_bars.loc[[0.75]] = ipc_socket_mem_read_error_bars.loc[[0.75]] - ipc_socket_mem_read_df.median(1)
ipc_socket_mem_read_error_bars_values = [ipc_socket_mem_read_error_bars.values]

ipc_socket_mem_write_error_bars = ipc_socket_mem_write_df.quantile([.25, .75], axis=1)
ipc_socket_mem_write_error_bars.loc[[0.25]] = ipc_socket_mem_write_df.median(1) - ipc_socket_mem_write_error_bars.loc[[0.25]]
ipc_socket_mem_write_error_bars.loc[[0.75]] = ipc_socket_mem_write_error_bars.loc[[0.75]] - ipc_socket_mem_write_df.median(1)
ipc_socket_mem_write_error_bars_values = [ipc_socket_mem_write_error_bars.values]

# Create and label the plot
fig, read = plt.subplots(2, sharex=True)

ipc_pipe_mem_read_df.median(1).plot(ax=read[0], figsize=(9,9), yerr=ipc_pipe_mem_read_error_bars_values, label="mem read", color='b', legend=True)
ipc_pipe_mem_write_df.median(1).plot(ax=read[0], figsize=(9,9), yerr=ipc_pipe_mem_write_error_bars_values,  label="mem write", color='g', linestyle='--', legend=True)

ipc_socket_default_mem_read_df.median(1).plot(ax=read[1], figsize=(9,9), yerr=ipc_socket_default_mem_read_error_bars_values, label="mem read", linestyle='solid', color='b', legend=True)
ipc_socket_default_mem_write_df.median(1).plot(ax=read[1], figsize=(9,9), yerr=ipc_socket_default_mem_write_error_bars_values,  label="mem write", color='g', linestyle='dashed', legend=True)
ipc_socket_mem_read_df.median(1).plot(ax=read[1], figsize=(9,9), yerr=ipc_socket_mem_read_error_bars_values, label="mem read -s", color='r', linestyle='dashdot', legend=True)
ipc_socket_mem_write_df.median(1).plot(ax=read[1], figsize=(9,9), yerr=ipc_socket_mem_write_error_bars_values,  label="mem write -s", color='c', linestyle='dotted', legend=True)

read[0].set_title('ipc-static -i pipe 2thread')
read[0].set_xlabel('Buffer size (Bytes)')
read[0].set_ylabel('Count')
read[0].set_xscale('log')
# Plot a vertical line at 8KiB
read[0].axvline(x=8*1024, color='r', linestyle='dotted')

read[1].set_title('ipc-static -i socket 2thread')
read[1].set_xlabel('Buffer size (Bytes)')
read[1].set_ylabel('Count')
read[1].set_xscale('log')

# Display the plot
# (this can take a while (~30 secs) on the BeagleBone Black)
plt.show()

# Display the plot and save it to a file
# (this can take a while (~30 secs) on the BeagleBone Black)
#plt.savefig("ipc_static_socket_2thread_mem.pdf")

- How does varying the buffersize (and in the case of sockets, also setting the kernel socket-buffersize) affect the degree to which the L1 and L2 caches improve performance?

In [None]:
import re

# Buffer sizes to compute the performance with
# Note: Preprocess limits prevent larger buffer sizes
BUFFER_SIZES = [512 * 2 ** exp for exp in range(0, 16)]

# Total size of iofile (default size) = 16MiB
TOTAL_SIZE = BUFFER_SIZES[-1] #16*1024*1024

# Number of trials for each buffer size
NUM_TRIALS = 2

ipc_pipe_bandwidth_values = []
ipc_bandwidth_values = []
ipc_bandwidth_s_values = []
  
print_header("Capturing info on bandwidth")

for buffer_size in BUFFER_SIZES:
    print "Measuring performance for buffer size = {} bytes (max = {}) ".format(buffer_size, BUFFER_SIZES[-1])
    for trial in range(0, NUM_TRIALS):
        # Pipe
        output = !ipc/ipc-static -i pipe -b {str(buffer_size)} 2thread
          
        # Extract and store the bandwidth (as reported by the benchmark)
        ipc_pipe_bandwidth_values.append(output[-1].split(" ")[0])
        
        # local
        output = !ipc/ipc-static -i local -b {str(buffer_size)} 2thread
          
        # Extract and store the bandwidth (as reported by the benchmark)
        ipc_bandwidth_values.append(output[-1].split(" ")[0])
            
        # local -s
        output = !ipc/ipc-static -i local -s -b {str(buffer_size)} 2thread
          
        # Extract and store the bandwidth (as reported by the benchmark)
        ipc_bandwidth_s_values.append(output[-1].split(" ")[0])
        
print_footer("Completed")

In [None]:
%matplotlib inline

# Reshape the list into an array of size [len(BUFFER_SIZES), NUM_TRIALS]
ipc_bandwidth = np.reshape(ipc_bandwidth_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]
ipc_bandwidth_s = np.reshape(ipc_bandwidth_s_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]
ipc_pipe_bandwidth = np.reshape(ipc_pipe_bandwidth_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]

# Convert the array of io bandwidth values into a Panda DataFrame
# this allows ploting of the median value and computation of the 
# error bars (25 and 75 percentile values)
# Note: The error bars should be small indicating that the experiment is tightly controlled
ipc_bandwidth_df = pd.DataFrame(ipc_bandwidth, index=BUFFER_SIZES)
ipc_bandwidth_s_df = pd.DataFrame(ipc_bandwidth_s, index=BUFFER_SIZES)
ipc_pipe_bandwidth_df = pd.DataFrame(ipc_pipe_bandwidth, index=BUFFER_SIZES)

# Compute error bars based on the 25 and 75 quartile values
ipc_bandwidth_error_bars = ipc_bandwidth_df.quantile([.25, .75], axis=1)
ipc_bandwidth_error_bars.loc[[0.25]] = ipc_bandwidth_df.median(1) - ipc_bandwidth_error_bars.loc[[0.25]]
ipc_bandwidth_error_bars.loc[[0.75]] = ipc_bandwidth_error_bars.loc[[0.75]] - ipc_bandwidth_df.median(1)
ipc_bandwidth_error_bars_values = [ipc_bandwidth_error_bars.values]

ipc_bandwidth_s_error_bars = ipc_bandwidth_s_df.quantile([.25, .75], axis=1)
ipc_bandwidth_s_error_bars.loc[[0.25]] = ipc_bandwidth_s_df.median(1) - ipc_bandwidth_s_error_bars.loc[[0.25]]
ipc_bandwidth_s_error_bars.loc[[0.75]] = ipc_bandwidth_s_error_bars.loc[[0.75]] - ipc_bandwidth_s_df.median(1)
ipc_bandwidth_s_error_bars_values = [ipc_bandwidth_s_error_bars.values]

ipc_pipe_bandwidth_error_bars = ipc_pipe_bandwidth_df.quantile([.25, .75], axis=1)
ipc_pipe_bandwidth_error_bars.loc[[0.25]] = ipc_pipe_bandwidth_df.median(1) - ipc_pipe_bandwidth_error_bars.loc[[0.25]]
ipc_pipe_bandwidth_error_bars.loc[[0.75]] = ipc_pipe_bandwidth_error_bars.loc[[0.75]] - ipc_pipe_bandwidth_df.median(1)
ipc_pipe_bandwidth_error_bars_values = [ipc_pipe_bandwidth_error_bars.values]

# Create and label the plot
plt.figure();
ipc_pipe_bandwidth_df.median(1).plot(figsize=(9,9), yerr=ipc_pipe_bandwidth_error_bars_values, label="pipe", color='b', legend=True)
ipc_bandwidth_df.median(1).plot(figsize=(9,9), yerr=ipc_bandwidth_error_bars_values, label="local", color='c', linestyle="dashed", legend=True)
ipc_bandwidth_s_df.median(1).plot(figsize=(9,9), yerr=ipc_bandwidth_s_error_bars_values, label="local -s", color='g', linestyle='dashdot', legend=True)
plt.title('ipc-static 2thread performance')
plt.ylabel('I/O bandwidth (KiBytes/sec)')
plt.xlabel('Buffer size (Bytes)')
plt.xscale('log')

# Plot a vertical line at 8KiB
plt.axvline(x=32*1024, color='r', linestyle="dotted")

# Display the plot
# (this can take a while (~30 secs) on the BeagleBone Black)
#plt.show()

# Display the plot and save it to a file
# (this can take a while (~30 secs) on the BeagleBone Black)
plt.savefig("ipc_static_2thread_performance.pdf")