Department of Computer Science and Technology

Access control and passwords on web servers

This page explains how you can restrict access for some of your web pages to certain domains, users, or groups of users. You can do this by creating a .htaccess file in the relevant subdirectory, and then add one or more Require directives, which allow you to control access to files in this directory and its subdirectories.

On 2016-10-04 we upgraded the main Lab web server from Apache 2.2 to Apache 2.4. The mechanism to restrict access changed between Apache 2.2 and 2.4, which means that .htaccess files need to be reviewed. Including the Apache Module mod_access_compat eases the transition, but does NOT cope with mixed use of machine and user authorisation. Generally, any .htaccess file containing the directives Order, Satisfy, Allow or Deny should be updated to use the new syntax. A list of such files is at /anfs/www/tools/share/migration/htaccess-22syntax.txt. More information on the migration and some tools are available at /anfs/www/tools/share/migration/README.txt and on the Lab WiKi.

Apache access control

Our Apache web servers have three ways to identify (“authenticate”) users who request files:

  1. by the domain name or IP address of the client machine (mod_authz_host)
  2. using the University’s Raven web authentication system (Ucam-WebAuth protocol)
  3. via a HTTP password

Method (1) is useful for permitting password-free access for every user connected to a departmental or the University network. You normally want to combine it with one of methods (2) or (3) for the benefit of users who access from outside the University network, e.g. from their home PC or mobile device.

Methods (2) and (3) are mutually exclusive, so you have to decide first whether you want to rely on the Raven password that users have already received from the University Information Services, or whether you want to maintain your own password file. Raven authentication is much less effort for you to set up, and is more secure, but it can currently only authenticate members of the University who have been assigned a CRSID. Most examples below use method (2).

Method (3) involves editing a password file using the Apache htpasswd command-line tool.

For details on all three options, read the Apache documentation section on Authentication and Authorization. The next section covers the simplest cases.

Common configuration examples

If one or more Require directives is present in an .htaccess file, then at least one of them has to match before a user is granted access. The following examples show some useful combinations:

  • Cambridge-wide access – allow access to anyone connecting from a Cambridge University Data Network IP address or who is able to login to Raven:
    Require ip 2001:630:210::/44 2a05:b400::/32
    Require host
    Require valid-user

    Note: Many IPv6 hosts at the department currently lack a reverse DNS entry, and so will not be recognized as hosts by the web server, hence the addition of the IPv6 address prefix above.

  • Lab-wide access – allow access to anyone who is either using the Computer Laboratory LAN (including the PWF student PCs) or who is a member of the Computer Laboratory:
    Require ip 2001:630:212:200::/56 2a05:b400:110::/48
    Require host
    Require group all-cl-users
  • Course-specific access – allow access to anyone who is a member of the group students-part1b, all-cl-users, or cl-supervisors:
    Require group students-part1b all-cl-users cl-supervisors
  • Individual access – allow access to a list of Raven users:
    Require user gsm10 mgk25 maj1
  • Password access (non-Raven) – deactivate Raven and allow basic password-controlled access to user “supervisor” (the hash value of the password of each user is listed in file /homes/mgk25/.htpasswd):
    AuthType Basic
    AuthName "supervisor login available from Markus Kuhn"
    AuthUserFile /homes/mgk25/.htpasswd
    Require user supervisor
    (The “htpasswd” tool for changing your .htpasswd file is in the Ubuntu Linux package “apache2-utils”.)

Predefined user groups

By default, our main web server defines the following groups of users (AuthGroupFile /auto/anfs/www/auth/group-raven):

students-part1a-50 CST Part IA 50% students
students-part1a-75 CST Part IA 75% students
students-part1a-cst both of the above
students-part1a-otherNST Part IA, PPS Part I students
students-part1a all of the above
students-part1b-50 CST Part IB students who take Paper 3 in Part IB
students-part1b-75 CST Part IB students who took Paper 3 in Part IA
students-part1b all CST Part IB students
students-part2-50 CST Part II students who take Paper 7 in Part II
students-part2-75 CST Part II students who took Paper 7 in Part IB
students-part2 all CST Part II students
students-part3 CST Part III students
students-acs MPhil ACS students
students-acsas other "associated" students taking ACS modules (e.g. from other departments)
students-all all of the above
wednesday departmental teaching officers, senior staff, SRAs
all-cl-users all resident members of the department (including research students, visitors, interns, but excluding students on bachelor/masters courses)
undergrad-directors-of-studiesmembers of the Computer Science DoS mailing list of the same name
cl-supervisorsmembers of the Lookup group of the same name

Each of the students-* groups has a year-specific alias, with (mostly) immutable membership. For example, between early October 2008 and early September 2009, the group students-part1b is identical to the group students-part1b-0809. These year-specific aliases currently remain valid for four years. This facility is useful for giving students of a particular year access to solution notes after the end of supervisions, without risking that this access will be transfered accidentally next September too early to the next lot.

Denying access to groups “all-cl-users”, “students-part3”, or “students-acs” may be ineffective. In most cases, our main web server serves files by accessing them from the filer “elmer” via NFS, according to the “other” access-control bits set for the file and its parent directories. Anyone with access to the filer (roughly all-cl-users and masters students) can also see the same files that the web server serves via NFS or CIFS, and can bypass .htaccess restrictions this way. Routinely allowing access to members of “all-cl-users” ensures that the access control settings reflect what is usually enforced.

The web server runs under the user and group id “www-cl”, therefore it is, in principle, also possible to grant it access via the “user” or “group” permission bits.

It is also possible to test membership of Lookup groups via LDAP (although this is still a somewhat experimental facility).

General policy

As a general rule, the Computer Laboratory aims to make all its information as freely available as possible. Access on its web site should only be restricted where there is a specific reason to do so.

Typical examples for material where access control may be necessary:

  • Personal data
  • Material for which the copyright holder has not given permission for global web release
  • Model answers to exercises and past exam questions
  • Draft publications that are not yet ready for release

Restricting search engines

You can also use .htaccess files to add the HTTP header field X-Robots-Tag to downloaded files. This instructs the crawlers of some search engines (e.g., Google) to skip these files for certain operations. For example, to discourage such search engines from indexing the text in any files under that directory, add to .htaccess the line

Header add X-Robots-Tag noindex

To prevent these search engines from both indexing the text as well as following any of the links in it, use

Header add X-Robots-Tag "noindex, nofollow"

There is also "noarchive" to discourage them keeping a cached or archival copy of your files.

Redirecting to other URLs

You can also place Redirect or RedirectMatch directives into your .htaccess files, to redirect browsers that try to access one file to some other URL, e.g. if material has moved elsewhere:

Redirect permanent /~crsid/draft-paper.pdf
Redirect temp /teaching/current /teaching/1819
Redirect seeother /~crsid/obsolete-document.pdf /new-document/
Redirect gone /~crsid/deleted-paper.pdf

HTTP distinguishes semantically between four different forms of redirect:

  • Redirect permanent old-path new-path – the resource has permanently moved to a new location and any uses of the old URL should therefore be updated.
  • Redirect temp old-path new-path – the resource can currently be found at this other location, but there is no need to update any uses of the old URL.
  • Redirect seeother old-path new-path – this redirect points to a location that is related the required information, but is not identical (e.g., the new home page of the successor project, rather than the new location of that particular PDF), manual confirmation of the link is needed before updating the URL.
  • Redirect gone old-path – signals that there is no new URL for this resource, and the link should be deleted.

Here old-path must be a local absolute path starting with a slash, whereas new-path can be either a complete URL or a local absolute path starting with a slash.

If old-path is a directory, the redirect will apply recursively to all files and directories in it. If you just want to redirect accesses to index.html, then end old-path in .../index.html. The redirect will then also apply if the requested URL just ended in a slash, but will not apply recursively.