The task of allocating numbers to sites in the Internet has now become so vast, that it is delegated to a number of organisations around the world - ask your Internet provider where they get the numbers from if you are interested.

This is a cautious estimate based on the host count made by Mark Lotter of DNS registered hosts. The true number is much higher than this, but by how much, no-one can accurately say.

It will have version number 6, as opposed to 5 which is one more than the current IP version, as an experimental realtime protocol called ST is using 5.

There is still a fiery debate raging about charging policies and mechanisms. The prevailing technical wind is behind the idea of keeping useage based charging (e.g. connect time or number of packets sent) to a minimum; perhaps only charging for a premium service when the network is overloaded/congested. However, the old Public Network Operator view that more income is made by charging all the time, is hovering in the wings, despite the evidence that it costs a huge percentage of profit to collect such charges, and it discourages new users. We believe that charging in proportion to the value of data delivered from Information services is a far more valid approach, and that the networks really will become much like the roads in terms of cost recovery.

An RFC is a Request For Comments, and is part of the standardisation process of the Internet Engineering Task Force

Try `` telnet machine smtp''. In fact, for many information services, the application search or browse protocol is text based and can be driven by a human typing, albeit rather obscure commands, rather than a client GUI program! Try this with WWW: `` telnet www.cs.ucl.ac.uk 80. See later chapters for what to type next and what comes back though!

RFC822 is the basic standard for internet electronic mail

You can tell it is third generation as it talks about ``objects'' rather than files.

remember: ``on top of'' means ``encapsulated within'' as in 1.3

CRLF is "Carriage Return" (ASCII character 13) followed by "Line Feed" (ASCII character 10)

This bug is long since fixed. Basically, the finger daemon had storage for receiving a limited request/command, but could actually be handed a larger amount of information from the transport protocol. The extra information would overwrite the stack of the executing finger server program. An ingenious hacker could exploit this by sending a finger command carefully constructed with executable code that carried out his desired misdemeanour. The problem was exacerbated on many systems where the finger server ran as a special privileged process (root!), for no particular reason other than laziness of the designers of the default configuration. Thus the wily hacker gained access to arbitrary rights on the system.

The ITU is International Telecommunications Union, the body that oversees all national telecom companies

hyper comes from the Greek prefix meaning above or over, and generally means some additional functionality is present compared with simple text. In this case, that additional functionality is in two forms: graphics or other media, and links or references to other pieces of (hyper)-text. These links are another component of the WWW, called Uniform Resource Locators.

Cache is usually, but not always pronounced the same way as cash. It is nothing to do with money, or even ATM, whether ATM stands for Automatic Teller Machine, or Asynchronous Transfer Mode, or even Another Terrible Mistake.

or a long time if your link is slow and the file is large

Transmission Control Protocol - the internet protocol that attempts to achieve a reliable connection over the unreliable internet - see chapters 1 and 2

www.ncsa.uiuc.edu is the Domain Name System (DNS) name of a computer called www at the ncsa (National Center for Supercomputing Applications) group of uiuc (the University of Illinois at Urbana-Champagne), which is denoted an educational institution. See chapter 1 for more details of DNS

a hyper-link is a way of linking pieces of information together, so that when a user follows the link, she is taken from one piece of information to another related piece of information

we tend to think of what we see on screen as being a front. The hard work goes on behind the scenes!

GIF stands for Graphical Interchange Format, and is one form a still image can take. GIF images are relatively compact as the data is compressed, so they're quite a good format to use in the Web

URLs used in links can be either absolute - that is they specify the protocol, the machine, the directory and the filename, or they can be relative, and the unspecified parts are assumed to be the same as they are for the page containing the link

SGML is an ISO standard, for what that is worth.

Mosaic actually lets you configure which font you want to see for each heading from your Xresources, but this is independent from the actual markup specification - see chapter 4

most browsers let you delay image loading if you're working over a slow network

Fortunately this book isn't yet equipped with a mouse or a radio-modem, so this is unlikely to work!

typewriter style fonts are fixed width - ie all that characters are the same width. Book fonts and the default fonts used by WWW clients such as Mosaic are variable width fonts, so letters like ``l'' are narrower than letters like ``m''. Generally variable width fonts are more pleasant to read than fixed width fonts.

The parent directory of a directory is the directory above it in the filesystem tree. On a Mac, you may think of the parent folder on a particular folder as being the folder that contains it

We use the terms client and browser interchangeably

Embedded images need not necessarily be stored on the same server as the document they are embedded in

The image will be returned with an `` image'' content type

We still speak of a viewer program even when we're ``viewing'' audio!

note if you change your mailcap file, you may need to restart your client program before it will recognise the changes.

Remember: If you want to put information in the Web in an immutable way, then using some image or postscript or similar approach may be more appropriate.

In fact, without Windows, the best option for an MS DOS user is to buy dial up Internet access, and use a line mode client like lynx on the dial up Internet host. However, windows tasking and networking are rapidly becoming available making full access from GUI clients a lot easier

usually port 80

subject to the client passing any security restrictions the server may have

This can often by used to attempt to debug a misbehaving server. However, some servers don't check their input too carefully, and it may be possible to crash them by typing incorrect HTTP commands.

a proxy server need not have a file system, but most do

or Aliases as they're called on the Mac

For instance NCSA's HTTPD allows the directory cgi-bin to be defined as a ScriptAlias. MacHTTP allows you to configure filename extensions such as .script to denote executable scripts

URL encoding replaces spaces with ``+'' and encodes other special characters as ``XX'' where XX is the ASCII code for the character in octal

This is the Department of Computer Science, University College London, or UCL CS

All together now - there ain't no Sanity Clause

available with the URL

The Mac File Type could be edited using ResEdit, but this would have to be done manually for every file.

The notation ``~ a_user'' is often used to denote the home filestore of the user with username `` a_user'' on Unix systems

MIME is the multipurpose internet mail extensions, and MIME types are how a WWW server indicated to a WWW client what type of data is being returned

inetd is the Intenet Daemon, which has a configuration file that specifies that when an incoming connection arrives on a particular port, the corresponding program will be started up to handle that connection

The notation ``~ a_user'' is used to denote the home filestore of the user with username `` a_user'' on Unix systems

Common Gateway Interface

This can be useful in an environment like a university where you don't necessarily trust your own system's users!

like most programming languages, AppleScript is very unforgiving of typing mistakes, so if you've not programmed before, but very careful to ensure you don't miss any characters

the name cgi-bin is actually configurable as a ScriptAlias from the server's configuration file srm.conf. CGI is the Common Gateway Interface - a standard way of passing parameters into server scripts so that commands are portable across different server types

The location of the configuration directory is set using the ServerRoot command in the httpd.conf file

assuming the configuration file contains an Exec rule specifying that cgi-bin contains scripts to be executed

We could have written this in any number of languages, but perl allows us to illustrate the main points with a relatively short program! Perl is freely available on Unix, DOS, and Macintosh systems

the Bourne shell is the standard command interpreter on Unix systems

Very very paranoid sites are concerned with two more security facets: Traffic pattern analysis and covert signaling. Traffic pattern analysis might concern finance houses who would be worried if it was known which information providers they were gathering information from in combination, for example, since it might permit others to trade on this knowledge (impending mergers etc). Covert signaling is a way of carrying information piggy-backed on legitimate information. This is very handy for spies getting infomration out of secure sites.

There are some who believe that every type of Internet access should be billed for on a usage basis. This is problematic, and in fact, it has been shown that does not maximise profit. Only the user herself knows how "urgent" a file transfer is. With very many types of data around, the net can only charge for the ones it really knows about, like long-distance voice or high quality video say.

whatever generic means

Jon Crowcroft
Wed May 10 11:46:29 BST 1995