# L41: Lab 1 - Getting started with kernel tracing - I/O

The goals of this lab are to:
    
- Introduce you to our experimental environment and DTrace.
- Have you explore user-kernel interactions via system calls and traps.
- Gain experience tracing I/O beahviour in UNIX.
- Build intuitions about the probe effect.

You will do this using DTrace to analyse the behaviour of a potted, kernel-intensive block-I/O benchmark.

---
## Running the benchmark

Once built, you can run the benchmark binaries as follows, with the command-line arguments specifying various benchmark parameters:

In [None]:
# Execute the io-static benchmark displaying its command line options
!io/io-static

In [None]:
# Execute the io-dynamic benchmark displaying its command line options
!io/io-dynamic

### Example benchmark commands

This command creates a default-sized data file in the `/data` filesystem:

In [None]:
# Example benchmark command
print_header("Creating file to run benchmark")

!io/io-static -c iofile
    
print_footer("Completed")

This command runs a simple `read()` benchmark on the data file, printing additional information about the benchmark run:

In [None]:
# Example benchmark command
print_header("Running benchmark")

!io/io-static -v -r iofile

print_footer("Completed")

This command runs a simple `write()` benchmark on the data file, printing additional information about the benchmark run:

In [None]:
# Example benchmark command
print_header("Running benchmark")

!io/io-static -v -w iofile

print_footer("Completed")

If performing whole-program analysis using DTrace, be sure to surpress output (`-q`) and run the benchmark in bare (`-B`) mode:

In [None]:
# Example benchmark
print_header("Running benchmark")

!io/io-static -B -q -r iofile

print_footer("Completed")

The following command disables use of the buffer cache when running a read benchmark; be sure to discard the ouput of the first run of this command:

In [None]:
# Example benchmark command
print_header("Running benchmark")

!io/io-static -d -r iofile

print_footer("Completed")

To better understand kernel behaviour, you may also wish to run the benchmark against `/dev/zero`, pseudo-device that returns all zeroes, and discards all writes:

In [None]:
# Example benchmark command
print_header("Running benchmark")

!io/io-static -r /dev/zero

print_footer("Completed")

To get a high-level summary of execution time, including a breakdown of total wall-clock time, time in userspace and 'system-time', use the UNIX time command:

In [None]:
# Example benchmark command
print_header("Running benchmark")

!time -p io/io-static -r -B -d -q iofile

print_footer("Completed")

---
## Exploratory questions

### 1. Baseline benchmarks and performance analysis:

- How do `read()` and `write()` performance compare?

In [None]:
# Perform setup for the read and write io performance benchmarks.
# NOTE: This cell must be executed before the read or write performance
# benchmarks can be run

# D Language script
# The time to execute the io-static benchmark is printed as a JSON string
# in the format: { "timestamp": "..." }
# A JSON formetted string can be easily converted into Python objects using json.loads().
io_performance_script = """
BEGIN {
   self->targetPid = -1;
}

proc:::exec-success
/execname == "io-static"/
{
   self->targetPid = pid;
   self->start = vtimestamp;
}

syscall::exit:entry
/pid == self->targetPid/
{
   self->targetPid = -1;
   printf("{\\"timestamp\\": %u}", vtimestamp - self->start);
}

END
{
    trace("Stopped");
    /* You might print summary statistics here */
    exit(0);
}
"""

# Buffer sizes to compute the performance for
BUFFER_SIZES = [512 * 2 ** exp for exp in range(0, 16)]

# Total size of iofile (default size) = 16MiB
TOTAL_SIZE = BUFFER_SIZES[-1] #16*1024*1024

# Number of trials for each buffer size
NUM_TRIALS = 2

#### Read performance (against buffer size)

In [None]:
# Start the DTrace instrumentation
start_instr_benchmark(io_performance_script)

# Display header to indicate that the benchmarking has started
print_header(["Starting io-static read performance measurement",
    "Note: with the buffer cache disabled (-d) and small buffer sizes,",
    "the benchmark can take a long time…"])

for buffer_size in BUFFER_SIZES:
    print "Computing performance for buffer size = {} bytes (max = {}) ".format(buffer_size, BUFFER_SIZES[-1])
    for trial in range(0, NUM_TRIALS):
        # Display the progress for the given buffer size
        sys.stdout.write("\r[%-{}s] %d%%".format(NUM_TRIALS) % ('='*trial, trial*(100/NUM_TRIALS)))
        sys.stdout.flush()
         
        # Run the io-static benchmark
        !io/io-static -r -B -q -b {str(buffer_size)} -t {str(TOTAL_SIZE)} iofile
    sys.stdout.write("\r")

# The benchmark has completed - stop the DTrace instrumentation
stop_instr_benchmark()

# Read the completion times for each run of the benchmark
read_performance_values = get_instr_values()

# Display footer to indicate that the benchmark has finished
print_footer(["Finished io-static read performance measurement"])

#### Plot read performance

In [None]:
%matplotlib inline

# Plot the read performance (IO bandwidth against buffer size with error bars)
print_header(["Display the plot (this takes approximately 30 secs on the BBB"])
 
# Compute the IO bandwidth in KiBytes/sec
read_io_bandwidth_values = [(TOTAL_SIZE/1024)/(val["timestamp"]/1e9) for val in read_performance_values]

# Reshape the list into an array of size [len(BUFFER_SIZES), NUM_TRIALS]
read_io_bandwidth = np.reshape(read_io_bandwidth_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]

# Convert the array of io bandwidth values into a Panda DataFrame
# this allows ploting of the median value and computation of the 
# error bars (25 and 75 percentile values)
read_df = pd.DataFrame(read_io_bandwidth, index=BUFFER_SIZES)

# Compute error bars based on the 25 and 75 quartile values
# Note: The error bars should be small indicating that the experiment is tightly controlled
read_error_bars = read_df.quantile([.25, .75], axis=1)
read_error_bars.loc[[0.25]] = read_df.median(1) - read_error_bars.loc[[0.25]]
read_error_bars.loc[[0.75]] = read_error_bars.loc[[0.75]] - read_df.median(1)
read_error_bars_values = [read_error_bars.values]

# Create and label the plot
plt.figure(); read_df.median(1).plot(figsize=(9,9), yerr=read_error_bars_values, label="io-static read")
plt.title('io-static read performance')
plt.ylabel('I/O bandwidth (KiBytes/sec)')
plt.xlabel('Buffer size (Bytes)')
plt.xscale('log')

# Plot a vertical line at 32KiB and 1MiB
# plt.axvline(x=16*1024, color='g')

# Display the plot
plt.plot()

# Save the plot to a file on the BBB
# plt.savefig("io_static_read_performance.pdf")

#### Write performance (against buffer size)

In [None]:
# Start the DTrace instrumentation
start_instr_benchmark(io_performance_script)

# Display header to indicate that the benchmarking has started
print_header(["Starting io-static write performance measurement",
    "Note: with the buffer cache disabled (-d) and small buffer sizes,",
    "the benchmark can take a long time…"])
  
for buffer_size in BUFFER_SIZES:
    print "Computing performance for buffer size = {} bytes (max = {}) ".format(buffer_size, BUFFER_SIZES[-1])
    for trial in range(0, NUM_TRIALS):
        # Display the progress for the given buffer size
        sys.stdout.write("\r[%-{}s] %d%%".format(NUM_TRIALS) % ('='*trial, trial*(100/NUM_TRIALS)))
        sys.stdout.flush()
        
        # Run the io-static benchmark
        !io/io-static -w -B -q -b {str(buffer_size)} iofile
    sys.stdout.write("\r")
    
# The benchmark has completed - stop the DTrace instrumentation
stop_instr_benchmark()

# Read the completion times for each run of the benchmark
write_performance_values = get_instr_values()

# Display footer to indicate that the benchmark has finished
print_footer("Finished io-static write performance measurement")

#### Plot write performance

In [None]:
%matplotlib inline

# Plot the read performance (IO bandwidth against buffer size with error bars)
print_header(["Display the plot (this takes approximately 30 secs on the BBB"])
    
# Compute the IO bandwidth in KiBytes/sec
write_io_bandwidth_values = [(TOTAL_SIZE/1024)/(val["timestamp"]/1e9) for val in write_performance_values]

# Reshape the list into an array of size [len(BUFFER_SIZES), NUM_TRIALS]
write_io_bandwidth = np.reshape(write_io_bandwidth_values, (len(BUFFER_SIZES), NUM_TRIALS))[:,:]

# Convert the array of io bandwidth values into a Panda DataFrame
# this allows ploting of the median value and computation of the 
# error bars (25 and 75 percentile values)
write_df = pd.DataFrame(write_io_bandwidth, index=BUFFER_SIZES)

# Compute error bars based on the 25 and 75 quartile values
# Note: The error bars should be small indicating that the experiment is tightly controlled
write_error_bars = write_df.quantile([.25, .75], axis=1)
write_error_bars.loc[[0.25]] = write_df.median(1) - write_error_bars.loc[[0.25]]
write_error_bars.loc[[0.75]] = write_error_bars.loc[[0.75]] - write_df.median(1)
write_error_bars_values = [write_error_bars.values]

# Create and label the plot
plt.figure(); write_df.median(1).plot(figsize=(9,9), yerr=write_error_bars_values, label="io-static write")
plt.title('io-static write performance')
plt.ylabel('I/O bandwidth (KiBytes/sec)')
plt.xlabel('Buffer size (Bytes)')
plt.xscale('log')

# Display the plot
plt.plot()

# Save the plot to a file on the BBB
# plt.savefig("io_static_write_performance.pdf")