Computer Laboratory

Malcolm Scott

Supervision exercises: Computer Networking (2016)

You are expected to produce solutions to all the questions for each supervision. I don't expect long essay-style answers or intricate diagrams (I've listed a number of marks for each question as a clue to how much you should write); spend longer thinking about the questions than writing your answers.

Email your solutions to me at least 24 hours before the supervision starts. If you prefer to hand-write your answers, I'd much prefer it if you could email scans. I prefer PDF, formatted with wide margins and plenty of white space, so that I can annotate it.

When emailing me regarding supervisions (or any other lab business) please only use my lab address, or your email will be misfiled and may slip by unnoticed:

Some of these questions were inspired by, adapted from or blatantly stolen from numerous sources in the lab and elsewhere, to whom I am grateful.


Supervision 1

1.0. Terminology and Dubious Analogy

  1. Consider a communication network consisting of a room full of people, where one or more people are exchanging thoughts and ideas with one or more others by talking.
    1. For each of the abstract terms
      • node
      • channel
      • entity
      • layer
      • transmission (the act thereof)
      • coding
      • addressing
      • multiplexing
      identify one or more corresponding concrete components or activities within the system. If the correspondence is not exact, give the closest approximation you can, and explain why it is not exact. [4 marks]
  2. Multiplexing basics:
    1. Give an example of multiplexing in a real, non-computer-related system. Is the multiplexing centralised or distributed? Why? [2 marks]
    2. What kind of network traffic is suited to synchronous time-division multiplexing? [2 marks]
    3. Give three ways in which asynchronous time-division multiplexing is more complex than synchronous time-division multiplexing. Why is asynchronous TDM used on the Internet? [5 marks]

1.1. Layering

  1. What is the purpose of the OSI model in the modern internet? Does it always serve that purpose? [2 marks]
  2. For each of the lower four layers of the OSI model (starting with the physical layer), give:
    • a brief description of the layer;
    • an example used on the Internet;
    • where it's implemented in the Internet in interconnection equipment and/or end systems;
    • a brief explanation of why the layer is useful (i.e., why can't we just omit it from the stack?).
    [8 marks]
  3. Topic 3 is entitled "The Data Link Layer". However, several of the slides are entirely concerned with other layers, or apply generally to any layer, so the title of the topic could be considered misleading. For each of the topics in slides 8 to 41 inclusive ("Coding – a channel function" to "Error Detection vs Correction"), identify which layer(s) it is concerned with. [6 marks]

1.2. Cross-layer effects

Suppose you enter a café with your laptop to find many independent open wi-fi networks run by nearby establishments to which your laptop can connect. Each network has two wireless access points connected to the same router; none of the networks require authentication. Your laptop initially associates with the access point with the strongest signal—but the signal strengths of the access points vary over time, so your association will also change over time (every few minutes).

For simplicity, assume that each network's router just provides IPv4, uses NAT, and is served by a different Internet Service Provider to each other network (i.e. it is equivalent to a "typical" home broadband installation).

We are interested in the effects of a changing link-level association, and the consequent change of IP address when an association changes.

  1. Suppose you are browsing the web using HTTP/1.0, and only occasionally downloading web pages. Is the changing link-layer association likely to be a problem for you? [2 marks]
  2. Now suppose that you start a large file transfer over TCP that takes half an hour to complete, and your laptop's link-layer association changes during the file transfer. Is the changing association likely to be a problem for you? [4 marks]
  3. If you identified any potential problems, suggest a few ways in which the user or the network operator might work around them. [3 marks]

(Adapted from Computer Networking, 3rd ed, review question 6.4)

1.3. Modern applications vs. the hourglass model

The Internet was designed according to an "hourglass model" in which IP acts as a lowest-common-denominator network layer protocol, underneath a variety of transport layer protocols and countless application layers (and, in turn, masking the details of various different data-link and physical layers from the application).

However, modern Internet applications have moved the neck of the hourglass upwards, often using HTTP (or HTTPS) as a lowest-common-denominator protocol, despite HTTP being originally intended only for the transfer of web pages. For example, Twitch.tv streams realtime live video over HTTP, Facebook uses HTTPS for text chat, Gmail provides a HTTPS interface to email and Skype will in some circumstances use HTTPS for realtime audio/video calls.

  1. Why did the designers of these and other applications choose to use HTTP for a purpose other than that for which it was designed? What benefit does this bring? [4 marks]
  2. What are the drawbacks of using HTTP for realtime audio/video? [2 marks]
  3. What effect might this ubiquity of HTTP have on future software developers, network engineers and/or computer scientists? [3 marks]

1.4. Wireshark basics—practical

Wireshark is a tool to capture and analyse network frames/packets sent from and received by your computer. We might use it in supervisions to illustrate what is actually going on in a network.

Wireshark needs administrative privileges to capture packets, so will not work on the MCS. You can install it on your own machine, though (it works on Linux, Windows or OS X). Once installed on Linux, you may need to run sudo wireshark to capture packets (but be aware of the risks of running complex applications as root); see the Wireshark wiki on Capture Privileges for other options. The Capture Setup wiki page may also be useful.

If you cannot get it to work, or do not have a suitable computer, let me know and I'll arrange something.

Please familiarise yourself with Wireshark. You might wish to consider the following tasks:

  1. Use Wireshark to capture a few packets whilst you use the Internet (e.g. load a web page, or send an email). You will probably have to read the documentation and/or experiment a bit to figure out how to do this. The result should look something like this screenshot.

    The top part of the window is a list of frames/packets. The middle shows a dissection of the contents of the selected frame. At the bottom is the raw, undecoded data of the selected frame exactly as received.

  2. Select a few frames and examine the dissections. Consider why (almost) every frame contains several protocols.
  3. Which protocols are always present in every packet, and which change depending on the packet you have selected? How do the protocols fit together to form a single frame? Is the content of your Internet use visible in the packet (i.e. can you see the web address you visited, the email address you contacted, etc.?) Why / why not? (The answer will depend on the protocol you picked.)

I'm not expecting you to provide answers to this part; I'd just like you to have thought about these things before the supervision.

Supervision 2

2.0. More wireless multiple access: CDMA

Note: don't confuse CDMA and CSMA.

This question gives an example of CDMA, so that you can demonstrate that it works—because it's not necessarily obvious that it does!

Consider a CDMA scenario with two senders and two receivers. The chipping rate is 8 mini-slots for each data bit. The 8-bit CDMA code for sender 1 is 1, -1, 1, -1, 1, -1, 1, 1. The 8-bit CDMA code for sender 2 is 1, 1, 1, -1, 1, 1, -1, -1. (See figure below.)

  1. Sender 1 has two data bits to send: a 1 followed by a -1; sender 2 also has two data bits to send: a -1 followed by a 1. The figure below shows the first two mini-slot bits sent by each sender, and the first two mini-slot combined bits values in the channel. Compute the remaining sequence of mini-slot bits sent into the channel by sender 1 and by sender 2, and the remaining combined bit values on the channel—i.e. fill in the grey-shaded regions in the figure to the right. [3 marks]

    Figure

  2. Assume now that there are two receivers. Receiver 1 wants to obtain the two data bits sent from sender 1 and knows sender 1's CDMA code; similarly, receiver 2 wants to receive the two data bits sent from sender 2 and knows sender 2's CDMA code. Both receivers receive the 16 mini-slotted bits in the combined channel (which you calculcated earlier). Show how each receiver performs the CDMA decoding operation in order to fill in the right-hand grey-shaded regions. [3 marks]

    Figure

(Computer Networking, 3rd ed, review questions 6.2 & 6.3)

2.1. Network layer protocols

  1. Draw diagrams showing the structure of an IPv4 packet and of an IPv6 packet. Briefly state the purpose of each header field. (You might want to make use of Wireshark to examine some real-life IP headers and how they relate to the raw data in a packet.) [10 marks]
  2. Where in relation to this structure would you look to find a link-layer header (e.g. Ethernet MAC) and a transport-layer header (e.g. TCP or UDP)? [2 marks]
  3. Automatic configuration is the process by which a node obtains an IP address and other details necessary to communicate at the network layer.
    1. IPv4 autoconfiguration is handled by DHCP. Describe the main configuration parameters which DHCP provides to nodes, and the purpose of each. [4 marks]
    2. Compare and contrast DHCP with IPv6 autoconfiguration, in terms of
      • how IP addresses are assigned to nodes,
      • the configuration parameters provided to nodes,
      • the interactions between nodes and the configuration server, and
      • the design motivation
      [8 marks]
  4. Describe the evolution of IPv4 addressing (class-based networks, subnetting, CIDR) including, for each:
    • the basic operation
    • the motivation for this scheme's introduction
    • how a router looks up the next hop for a packet
    [9 marks]
  5. During the last few years, almost all of the remaining IPv4 address space has been allocated. State a few consequences this could have on the efficiency of IPv4 addressing and routing. [3 marks]

2.2. Routing protocols

  1. State the purpose of routing protocols and routing tables. [2 marks]
  2. What is an Autonomous System (AS)? Contrast interior and exterior routing without describing in detail specific routing algorithms or protocols. [6 marks]
  3. What are the pros and cons of distance vector versus link state routing protocols? Give examples from protocols in use today. [10 marks]
  4. Where are hybrid schemes employed and why? [5 marks]
  5. Find out how to view the IPv4 routing table on your computer. How might this differ (in ways other than size) to the routing table on a router on the university network, and to the routing table on a router connected to an Internet Exchange Point (e.g. LINX)? [3 marks]

(First part of question is loosely adapted from 2008 Paper 8 Question 3)

2.3. Transport layer protocol selection

  1. DNS (the Domain Name System, an application layer protocol used to look up the corresponding IP addresses of host names such as www.cl.cam.ac.uk) originally only used UDP. Suggest reasons why DNS might have been designed based on UDP rather than TCP. [4 marks]
  2. Recently the internet has started to deploy DNSSEC (DNS Security extensions), which involves transmitting cryptographic signatures along with DNS data. These signatures are sometimes multiple kilobytes in size. Why might this motivate the use of TCP rather than UDP for DNS? (No specific knowledge of DNSSEC is required to answer this question.) [4 marks]

Supervision 3

3.0. Simplified TCP flow control

This question is adapted from 2002 Paper 9 Question 3, which is actually a Digital Communication II question. You should still be able to answer it, but you may give less detail than a Part II student might!

TCP incurs loss to discover the available capacity.

  1. Why does it do this? [1 mark]
  2. How can packet loss usually be detected without the sender waiting for an ACK timeout? How quickly can an overflowing router buffer be detected? [2 marks]
  3. Describe the AIMD (Additive Increase, Multiplicative Decrease) mechanism, and fast retransmit, and show how this leads to the characteristic "saw tooth" throughput behaviour of TCP over time. [5 marks]
  4. Consider a TCP connection operating in steady-state whereby each time the congestion window increases to W segments a single packet loss occurs (i.e. fast retransmit is working as intended, the congestion window is oscillating around the maximum capacity of the channel, and there is no other traffic competing for the channel). In terms of W and the round trip time (R), derive a simple formula for the time between the minimum and maximum data rates achieved. In this optimal scenario, how many packets are sent between each loss event? [4 marks]
  5. Derive the connection's average throughput in terms of the fraction of packets lost (p), the connection's round trip time (R), and the segment size (B). [4 marks]
  6. Under what conditions is this very simplistic model likely to be accurate? [3 marks]

3.1. Further TCP flow control

  1. Consider this plot of CWND (congestion window) versus time for a TCP connection.

    Figure

    At each of the marked points A through G along the timeline, indicate what event has happened, or what phase of congestion control TCP is in (as appropriate), from the following options: Slow-Start, Congestion-Avoidance, Fast-Retransmit and Timeout. [7 marks]

3.2. TCP performance

Use the approximate equation for throughput as a function of drop rate:

throughput = (√1.5 × MSS) / (RTT × √p)

Assume RTT, the round trip time, is 40 ms and MSS, the maximum segment size, is 1000 bytes. p is the proportion of packets which are dropped: one packet is dropped in every 1/p packets. In the following questions ignore packet headers in your calculations.

  1. What drop rate p would lead to a throughput of 1 Gbps (1 Gigabit per second)? [1 mark]
  2. What drop rate p would lead to a throughput of 10 Gbps? [1 mark]
  3. If the connection is sending data at a rate of 10 Gbps, how long on average is the time interval between drops? [1 mark]
  4. What window size W (measured in multiples of the MSS) would be required to maintain a sending rate of 10 Gbps? [1 mark]
  5. If a connection suffered a packet drop on reaching 10 Gbps, how long would it take for it to return to 10 Gbps after undergoing a fast retransmit? [1 mark]
  6. What can you conclude about TCP's performance on high-capacity links? [4 marks]

3.3. TCP and buffers

  1. How big should buffers in routers and switches be made in order to maximise the throughput of TCP? (There is no need to derive the formula; answer in words, not maths!) [2 marks]
  2. What problems might this cause for UDP-based Voice-over-IP traffic? [2 marks]
  3. How might these problems be mitigated? [4 marks]

Supervision 4

4.0. DNS details

A tool called "dig" is used to interrogate DNS servers for debugging purposes. It's fairly commonly available on UNIX systems (including linux.ds.cam.ac.uk which you can probably SSH to). Familiarise yourself with the dig command. (The manual page, also viewable with the command man dig, provides documentation; this quick tutorial might also be useful.) If you don't have access to a machine with dig installed, there is a web-based version available.

  1. Find examples of at least five different types of DNS record via dig and briefly explain their purpose. [5 marks]
  2. Manually perform an iterative DNS query of a domain name of your choice, by running a separate dig command for each iteration. Explain the process. [3 marks]
  3. Why are iterative queries necessary? [8 marks]
  4. Why are iterative queries typically not done by individual client hosts (i.e. your desktop, laptop or phone)? What do client hosts do instead? [3 marks]
  5. Bonus question slightly outside the course: Clients may in some circumances want to perform their own iterative queries regardless, instead of using the recursive DNS server provided by the network to which they are connected. Why? What extra guarantees can this provide?

4.1. Client/server vs. peer-to-peer

  1. Compare (briefly) the relative merits of client/server and peer-to-peer application models. [4 marks]
  2. Classify each of the systems below as client/server, peer-to-peer or hybrid, and explain why the designers may have decided to implement the system in that way:
    1. eBay
    2. Skype
    3. BitTorrent
    4. SSH (Secure Shell)
    5. DNS
    For those which are peer-to-peer, explain how a new client joins the network and locates the data it is interested in. [10 marks]—adapted from Computer Networking, 3rd ed, review question 2.1
  3. The Cambridge supervision system can be thought of as a peer-to-peer network. The department or your college acts as a rendezvous point or tracker, keeping lists of students and supervisors and enabling them to communicate directly, with the intention that the supervisor will (in theory!) pass knowledge data to the student. Furthermore a student who has taken the course may then choose to become a graduate student and rejoin the network as a supervisor in order to reshare the data. Comment on the durability and scalability of this system, comparing it with one of the peer-to-peer systems from the previous question. [4 marks]

4.2. Evolution of HTTP

A quick glossary:

  • In traditional non-persistent HTTP, each TCP / HTTP connection can only transfer one file. To transfer multiple files, the browser must close and re-open the TCP connection between each file.
  • In persistent HTTP (part of HTTP/1.1), the browser can keep the TCP / HTTP connection open after downloading a file in order to download subsequent files.
  • Persistent HTTP can also optionally use pipelining which means that a server can make a second (or subsequent) request before the first response has finished (the second file will be sent immediately as soon as the first download has finished, without waiting).
  • Recent browsers also use parallel HTTP connections to download multiple files simultaneously.
  • By response time I mean the time from the user requesting a URL to the page and all its components (images, etc.) being completely downloaded.

Here, we consider the performance of HTTP, comparing non-persistent HTTP with persistent HTTP. Suppose the page your browser wants to download is 100K bits long, and contains 10 embedded images (with file names img01.jpg, img02.jpg, … img10.jpg), each of which is also 100K bits long. The page and the 10 images are stored on the same server, which has a 300 msec roundtrip time (RTT) from your browser. We will abstract the network path between your browser and the Web server as a 100 Mbps link. You can assume that the time it takes to transmit a GET message into the link is zero, but you should account for the time it takes to transmit the base file and the embedded objects into the "link." This means that the server-to-client "link" has both a 150 msec one-way propagation delay, as well as a transmission delay associated with it. In your answer, be sure to account for the time needed to set up a TCP connection (1 RTT).

  1. Assuming non-persistent, non-parallel HTTP, how long is the response time for this page? Be sure to describe the various components that contribute to this delay.
  2. Again, assume non-persistent HTTP, but now assume that the browser can open as many parallel TCP connections to the server as it wants. What is the response time in this case?
  3. Now assume persistent HTTP (HTTP/1.1). What is the response time, assuming no pipelining?
  4. Now suppose persistent HTTP with pipelining is used. What is the response time?

[4 marks]—Computer Networking, 3rd ed, review question 2.8

4.3. Scaling HTTP

Traditionally, a website would be served from a single HTTP server. A single server is sometimes not sufficient to handle the load of a very busy modern website, though. The most obvious concern—but not the only concern—is the sheer quantity of network traffic involved in serving a media-rich website.

  1. List some other ways (besides the volume of network traffic exceeding the capabilities of the server's network connection) in which a single server may struggle to meet the demands of a busy website. [4 marks]
  2. List some advantages and disadvantages of using a load balancer to allow multiple web servers to host a single website. [4 marks]
  3. List some advantages and disadvantages of using a content distribution network (CDN) to cache this website. [2 marks]
  4. Is anycast useful to improve the scalability of a website? Why / why not? [6 marks]