Department of Computer Science and Technology

Archived personal web sites

The Computer Laboratory preserves here the personal web pages of selected former members of staff, in particular material that the respective author has maintained on their personal Computer Laboratory web space until their death and that we believe might be of continued scientific, technical or historic interest.

Archiving process

A former user’s original personal web space (~/public_html/) vanishes from our web server when we move their filer home directory into the departmental archive. Therefore, we create here an archival copy of its publically visible part, which browsers can then find via an HTTP 301 redirect from its original location.

We collect the archived personal web pages preserved here using a web crawler (see /anfs/www/html/archive/Makefile for the technical details of this process). This way, we publish here only files that the deceased had already linked publically, and therefore had clearly indicated that they wanted these files to be published. We do this, rather than copying over all files in their ~/public_html/ folder, as the latter typically also includes numerous other material clearly never intended for publication, or not even owned by the deceased, such as draft documents of other people that were only placed there temporarily for personal collaboration.

We may also add additional public files that were linked from elsewhere, and therefore may have been missed by our crawler.

We only modify the pages that we copy here to add a note about their status and the death of their author. We generally avoid changing their content, except perhaps to fix minor and obvious technical problems.

Staff members interested in a preview of which part of their web site we might archive here after their death can simulate our default file-harvesting process using this Linux command:

$ wget -r -nv -np -nH -l inf -P test-archive \
       --trust-server-names --regex-type pcre \
       --reject-regex '\?C=[NMSD];O=[AD]$' \
       --cut-dirs=1 http://www.cl.cam.ac.uk/~$USER/

The above command is of course only a starting point, and we often tailor the crawling process to the peculiarities of the individual web site.

If you have any questions or suggestions about this archive and the process, contact Markus Kuhn or pagemaster.