2.3. Structures of HTTP queries and responses

To answer this question, we need to consider, briefly, the nature of a web request. Exactly what gets sent to a server when a URL is requested? (And for that matter, what gets sent back?)

Here's an example of what might get sent:

GET /index.html HTTP/1.1 Host: www.dept.cam.ac.uk User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1.1) ... Accept: text/xml,application/xml,application/xhtml+xml,text/html;... Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive

To understand name-based virtual hosting consider just the first two lines. The GET request only includes to the local element of the URL. The second line specifies the host name that is being asked for it.

HTTP Query Structure

GET

The first line declares that this is a request from a client that wishes to read information from the server. GET is the most common HTTP method.

/index.html

The second term in the first line is the local element of the URL requested. Note that the leading part of the URL containing the server name has been stripped out.

HTTP/1.1

The final element declares that the query is couched in the language of version 1.1 of the HTTP standard.

Host: www.dept.cam.ac.uk

The second line indicates which server the query was addressed to. It is this element of the query that allows a web server to distinguish between web sites based purely on their names, regardless of the port number(s) or IP address(es) used.

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)

This optional line identifies the browser. Some servers vary the output according to this header, but you should remember that it is a hint and can be trivially changed on many browsers.

In this case Mozilla identifies the browser as one of the Netscape/Mozilla family and 5.0 ties it down to a version of Mozilla. Other information allows us to identify that it is a browser is running under Linux on an Intel platform, that it was built for the en-GB locale, and indicates the version numbers of the various components.

Accept: text/xml,application/xml,application/xhtml+xml,text/html; q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

This specifies the formats the browser can accept and how keen it is on them. Servers can be configured to negotiate various different formats of response depending on these parameters.

text/xml,application/xml,application/xhtml+xml means that the browser is happy to accept MIME content types text/xml, application/xml, or application/xhtml+xml; otherwise it will accept text/html but the qualifier q=0.9 means that, given a choice, the browser would prefer to receive one of the earlier types (default q=1.0) than text/html. text/plain means that it can accept plain text too. The qualifier q=0.8 makes this less preferred than anything else. The browser has a general preference for image/png. Finally it will accept any format (*/*) but is not keen on them (q=0.5).

We will meet MIME content types again in Chapter 4.

Accept-Language: en-gb,en;q=0.7,es;q=0.3

Just as it is possible to negotiate formats it is possible to negotiate languages. A page might appear in more than one language and the browser specifies what languages it can cope with and how desirable they are. One of the authors of this document is learning Spanish and has Spanish as a third choice in the language selections after British English and any other sort of English.

Accept-Encoding: gzip, deflate

Just as there was negotiation over MIME content type there can also be negotiation over MIME transfer encoding. This is a mechanism for the server and browser to agree on a way to (typically) compress the data stream prior to transfer.

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

The final topic for negotiation is the character set of any text that will be sent. In this case, ISO Latin 1 is preferred, with UTF-8 and indeed everything else coming second.

Connection: keep-alive

This tells the server that it need not close the network connection after sending back the response to the query as other requests may be sent down the same connection. As setting up and tearing down connections are expensive operations this is a major efficiency boost.

Keep-Alive: 300

This instructs the server to keep the connection alive for 300 seconds in case there are any more requests. After 300 seconds of idleness the server will drop the connection.

For the record, here is the response. To make the example work, I've installed a trivial index.html web page. We will use this later.

HTTP/1.x 200 OK
Date: Wed, 21 Feb 2007 17:53:35 GMT
Server: Apache/2.2.3 (Linux/SUSE)
Last-Modified: Wed, 21 Feb 2007 17:53:33 GMT
Etag: "1c0e43-132-3caa4140"
Accept-Ranges: bytes
Content-Length: 306
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title>The DEPT web site</title>
</head><body>
<h1>Welcome to DEPT</h1>
<p>This is the DEPT web site.</p>
</body>
</html>
</html>