Course pages 2015–16
Advanced Operating Systems
Principal lecturer: Dr Robert Watson
Taken by: MPhil ACS, Part III
Code: L41
Hours: 16 (6 one-hour lectures, 5 two-hour practical labs)
Prerequisites: Undergraduate Operating Systems course; please see syllabus
  for further details
Aims
Systems research refers to the study of a broad range of behaviours arising from complex system design, including: low-level operating systems; resource sharing and scheduling; interactions between hardware and software; network-protocol design and implementation; separation of mutually distrusting parties on a common platform; and control of distributed-system behaviours such as concurrency and data replication. This module will:
- Teach systems-analysis methodology and practice through tracing and performance profiling experiments;
- Expose students to real-world systems artefacts such as OS schedulers and network stacks, and consider their hardware-software interactions with CPUs and network-interface cards;
- Develop scientific writing skills through a series of laboratory reports; and
- Assign a selection of original research papers to give insight into potential research topics and approaches.
The teaching style will blend lectures and hands-on labs that teach methodology, design principles, and practical skills. Students will be taught about (and assessed via) a series of lab-report-style assignments based on in- and out-of-classroom practical work. The systems studied are real, and all wires will be live.
Prerequisites
It is strongly recommended that students:
- Have previously (and successfully) completed an undergraduate operating-system course with practical content -- or have equivalent experience through project or open-source work.
- Have reasonable comfort with the C and Python programming languages. C is the primary implementation language for systems that we will analyse, requiring reading fluency; userspace C programs will also be written and extended as part of lab exercises. Python may prove useful as a data-processing language, and provides useful tools for data analysis and presentation.
- Review an undergraduate operating-system textbook (such as the 'Dinosaur Book') to ensure that basic OS concepts such as the process model, inter-process communication, filesystems, and virtual memory are familiar.
- Be comfortable with the UNIX command-line environment including compiler/debugging tools. Students without this background may wish to sit in on the undergraduate Unix Tools course in Michaelmas (starts 5 Nov, 11:00, LT1).
Syllabus
The sessions are split up into three submodules:
- Weeks 1-2: Introduction to kernels and kernel tracing/analysis
-   The purpose of this submodule is to introduce students to the structure of a contemporary operating system kernel through tracing and profiling. Lecture 1: Introduction: OSes, Systems Research, and L41 (1h) 
 Lecture 2: Kernels and Tracing (1h)
 Lab 1: POSIX I/O Performance (2h)
 Deliverable: Lab Report 1 - POSIX I/O Performance
- Weeks 3-5: Processors, processes, and threads
-   This submodule introduces students to concrete implications of the UNIX process model: processes and threads in both userspace and kernelspace, the hardware foundations for kernel and process isolation, system calls, and traps. Lecture 3: The Process Model - 1 (1h) 
 Lecture 4: The Process Model - 2 (1h)
 Lab 2: Kernel Implications of IPC (2h)
 Lab 3: Micro-Architectural Implications of IPC (2h)
 Deliverable: Lab Report 2 - Inter-Process Communication Performance
- Weeks 6-8: TCP/IP
-   This submodule introduces students to a contemporary, multithreaded, multiprocessing network stack, with a particular interest in the TCP protocol. Labs will consider both the behaviour of a single TCP connection, exploring the TCP state machine, socket-buffer interactions with flow control, and TCP congestion control. Students will use DUMMYNET to simulate network latency and explore how TCP slow start and congestion avoidance respond to network conditions. The second marked lab report will be written. Lecture 5: The Network Stack (1) (1h) 
 Lecture 6: The Network Stack (2) (1h)
 Lab 4: The TCP State Machine (2h)
 Lab 5: TCP Latency and Bandwidth (2h)
 Deliverable: Lab Report 3 - The TCP State Machine, Latency, and Bandwidth
Objectives
On completion of this module, students should:
- Have an understanding of high-level OS kernel structure
- Gained insight into hardware-software interactions for compute and I/O
- Have practical skills in system tracing and performance analysis
- Have been exposed to research ideas in system structure and behaviour
- Have learned how to write systems-style performance evaluations
Coursework
Students will write and submit three lab reports to be marked by the instructor. The first report is a `practice run' intended to help students develop and analysis techniques and writing styles, and will not contribute to the final mark. The remaining two reports are marked and assessed, each constituting 50% of the final mark.
Practical work
Five 2-hour in-classroom labs will ask students to develop and use skills in racing and performance analysis as applied to real-world systems artefacts. Results from these labs (and follow-up work by students outside of the classroom) will by the primary input to lab reports.
Typical labs will involve using tracing and profiling to characterise specific behaviours (e.g., file I/O in terms of system calls and traps) as well as diagnose and fix problems through modifications to application-level behaviours (e.g., modifying network clients and servers to better exploit real-world TCP behaviour). Students may find it useful to work in pairs within the lab, but must prepare lab reports independently.
The module lecturer will give a short introductory lecture at the start of each lab, and instructors will be on-hand throughout labs to provide assistance. Lab participation is not directly included in the final mark, but lab work is a key input to lab reports that are assessed.
Assessment
Each student will write three lab reports, roughly 5-10 pages each including several figures. The first is a 'practice run' that will be used to develop lab-report structure, content, and presentation, and will not contribute to the final mark. The remaining two lab reports will each contribute 50% to the final mark.
Recommended reading
Primary module texts
Course texts provide instruction on statistics, operating-system design and implementation, and system tracing. You will be asked to read selected chapters from these, but will likely find other content in them useful as you proceed with the labs.
Marshall Kirk McKusick, George V. Neville-Neil, and Robert N. M. Watson. The Design and Implementation of the FreeBSD Operating System, 2nd Edition, Pearson Education, Boston, MA, USA, September 2014.
Brendan Gregg and Jim Mauro. DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD, Prentice Hall Press, Upper Saddle River, NJ, USA, April 2011.
Raj Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, Wiley - Interscience, New York, NY, USA, April 1991.
Additional texts
Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne, Operating System Concepts, Eighth Edition, John Wiley & Sons, Inc., New York, NY, USA, July 2008.
Brendan Gregg. Systems Performance: Enterprise and the Cloud, Prentice Hall Press, Upper Saddle River, NJ, USA, October 2013.
Research-paper readings
Research-paper readings will be announced as the terms proceed, but will likely include original papers on BPF, DTrace, OS scheduling, OS scalability, network stacks, and systems modelling.
