Computer Laboratory

elmer

Filespace in cl.cam.ac.uk

This document covers the general principles of file system usage within the department, as well as details that relate specifically to use from the two supported classes of operating system ("Unix"-type and Windows). Windows-specific information is at :Windows access to the File Server.

The department is fairly well-endowed with disc space, and has a sophisticated backup system. This document will explain what you should keep backed-up, and what you should keep on temporary storage.

Disc space in the department

elmerThe department has a large file server, known as elmer, or the filer. In addition managed desktop machines have their own discs. Thus lab members typically have access to several types of disc space:

  • The home directory – personal space, allocated to the user, on the department's file server.
  • "Temporary" space, on the computer's own disc.
  • "Global temporary" space, on the department's file server.
  • "Scratch" space, allocated to the user, on the computer's own disc.
  • "Scratch" space, allocated to the user, on the department's file server.
  • Shared space, typically allocated to a research group, also existing on the department's file server.
  • Disc space occupied, and used by, the operating system on their own machine. This space is not of much interest to the general user.

Following subsections discuss these different allocations.

Unix vs Windows views of the file server

File paths under Unix are handled in the usual Unix (NFS) way, so a particular directory might be /auto/userfiles/gsm10, for example. File paths under Windows are handled in the usual Windows (CIFS) way, so a particular folder might be \\filer\userfiles\gsm10, for example. Because the shared files housed on the fileserver can be accessed either via Unix systems or Windows systems both of these paths map to the same place:
i.e. /auto/userfiles/gsm10 (Unix) = \\filer\userfiles\gsm10 (Windows).

In general, to map from a Unix path to a Windows path reverse the direction of the slashes, make the first slash a double, and replace the first part of the Unix name (eg "auto", "anfs", "usr") with "filer". To map from a Windows path to a Unix path reverse the direction of the slashes and replace the first part of the Windows name (eg "filer" or "Elmer") with "auto" (or in some cases "anfs" or "usr"). (You don't need to remember to make the beginning double slash a single because a double slash happens to work under Unix.) e.g.
/auto/userfiles/* = \\filer\userfiles\*
/anfs/bigdisc/* = \\filer\bigdisc\*
/usr/groups/* = \\filer\groups\*

Note that manipulating the same files under both Windows and Unix may result in confusion because of inherent differences in the protection that can be set on the files under the two systems. However, it shouldn't be possible to do any harm, and the filer should enforce the access controls conservatively.

A Perl script is available to convert between Windows and Unix pathnames on the filer:

$ /anfs/www/tools/bin/filerpath /auto/userfiles/gsm10
\\filer\userfiles\gsm10
$ /anfs/www/tools/bin/filerpath '\\filer\userfiles\gsm10'
/auto/userfiles/gsm10

Users' home directories

When your account is created, an allocation is made for you on the file server. We create a filespace known as your superhome, which holds both your Unix home directory (known as your unix_home) and your Windows home directory (known as your windows_home), and also your Windows roaming profile. Everybody gets both a Unix and a Windows home even if they have stated that they are going to be Unix- or Windows-only users. Windows-specific information is at Windows access to the File Server.

The allocation is subject to quota: the default quota is (as of early 2009) 1 GByte. This may be enough for casual use but people who are doing serious research are expected to have to ask for an increase. We expect you to be realistic and honest in assessing your needs, bearing in mind that maintaining a high level of availability and reliability of filespace is vastly more expensive than the price of raw disc drives would lead you to believe. All requests for quota increases should be sent to the file server administrators (via mail to sys-admin). Your request is likely to be looked upon more favourably if you can demonstrate that you have read and understood this document and are using the various filespaces available wisely.

Your quota covers both your Unix usage and your Windows usage. You can examine your current usage and quota at http://www.cl.cam.ac.uk/local/sys/quotareport/ which requires Raven authentication to establish your identity. On most Linux systems there is also a command cl-rquota which will give the same information. The native quota command which comes with the operating system will not usually work in our environment.

The home disc allocation is on the file server, so that it is already quite robust; it is also backed-up.

You address your Unix home directory via /home and your CRSID : thus user gsm10's directory would be

  /home/gsm10/

The directory is in fact auto-mounted, and only springs into existence when you ask for it. Some unix commands will reveal the "real" file structure that underlies this automounting:

$ df /home/gsm10
Filesystem           1K-blocks      Used Available Use% Mounted on
elmer:/vol/vol1/homes-1
                      62914560  43748280  19166280  70% /Nfs/Mounts/homes-1

The df command reveals here that the directory /home/gsm10 under Unix is actually /Nfs/Mounts/homes-1/gsm10/unix_home, and that it really exists within a particular directory structure on elmer. There will also be a /Nfs/Mounts/homes-1/gsm10/windows_home, which houses the Windows home directory. As mentioned above, technically the directory which houses these separate homes (and your Windows roaming profile), ie /Nfs/Mounts/homes-1/gsm10 in this example, is called your superhome. You should never refer to your home filespace by any of these more specific routes, only use /home/your CRSID. This is because system administrators reserve the right to move filespaces around to correct faults or balance allocations – this should be transparent to you, and the /home route will always work, whereas other routes may not be permanent.

The preferred path for viewing your superhome is via /auto/userfiles/ under Unix or \\filer\userfiles under Windows.

Your Windows home directory is mapped to a drive letter (typically Z:) when you log on to a Windows machine that's connected to the network.

Windows roaming profiles are copied to the local machine from the file server whenever you log on, and copied back to the file server when you log off. This copying is a strong disincentive to keeping any but the smallest amounts of information in your roaming profile. Until you log out again, none of updated information is on the file server, and hence subject to file server and backup protection; yet if you have large amounts of data in your roaming profile, logging in and out is a slow business. Despite the insidious way that Microsoft software encourages you, don't store things on your 'desktop' or in My Documents until you have redirected it, or the like - store everything in your Windows home directory, or subdirectories of that.

Users of workstation do not get a roaming profile by default, you should ensure that your redirect your My Documents folder to the Z: drive. Every temrinal server user does get a roaming profile so that you can login to any of the terminal servers and see the same desktop. However, that profile is capped at 30M. If you exceed the limit you will be nagged until you tidy up and the profile will not be saved.

The file server is particularly good at keeping large amounts of static data. When a file changes it keeps track of the differences from the previous version. For this reason it is much less of a load on the file server keeping large files which change rarely (if at all) than it is keeping small files which change frequently. The rate of change of files is known as "churn" – the filer is best at keeping files with low churn. Because of this we would prefer users to have a large enough quota to hold their working set of files, rather than continuously copying files on and off their filer space into one of the alternative storage spaces detailed below. If you find yourself continuously copying files to other places to avoid going over quota then please a) tidy up and get rid of anything you don't need, b) ask for an increase in quota (via mail to sys-admin).

As was stated above, if you use a Windows machine your roaming profile is copied back to the file server when you log off. Every file is copied, and then some adjustments are made to preserve the dates of files – so even if it doesn't look as though files have changed they will be regarded as new files by the file server. Thus keeping anything at all in your roaming profile results in high filer churn – this is a bad thing, so please avoid it if possible.

Everything in your home directory (and superhome) is intended for your personal use. Attempting to share this space with other people causes a number of difficulties, especially with disc quotas, and should be avoided. Please do not create directories and invite other people to create files in them, or accept invitations from others to write into their space. Collaborative work should be done in a group filespace.

Temporary disc space

Under Linux, there is system-provided temporary space in a directory called /tmp (alternatives are /var/tmp and /usr/tmp. Applications (from web browsers to compilers) tend to use this space on the user's behalf, but it's typically not terribly big and users should employ scratch space (see subsection "Local scratch space", below) for anything more than trivia. /tmp can provide very fast access - it is often an "in-memory" filing system.

You can prevent some applications from working, by filling-up /tmp, so the directory should only be used for small items.

Furthermore, Linux reboot sequences are likely to delete things in /tmp, and system management scripts may do the same from time to time. Nothing in /tmp is backed-up either, so it's a rash user who dumps things in /tmp that are anything other than "truly" temporary.

System-wide temporary space

There is also space on the file server that you can access as /anfs/bigtmp. This is a single resource for the whole department, and is cleared regularly. It's much larger than the typical /tmp partition, but will be slower, since it's on the other side of the network.

Local scratch space

"Scratch" space is space that is not backed up. If a file in scratch space is deleted it is permanently lost.

Scratch space is provided, wherever possible, on a machine's local disc; there will be directories in the machine's file system called /local/scratch, /local/scratch-1, and so on. To start with, you (the user) won't have access to anything in a scratch directory: you need to create a directory to use. Under Unix the command

$ sudo cl-mkscratchdir

will make you a directory on /local/scratch; the directory's name is your user identifier, so if user gsm10 were to issue the command, he would get a directory /local/scratch/gsm10. If given the extra argument "1" it will create /local/scratch-1/gsm10 etc.

cl-mkscratchdir has a number of obscure failure modes, and just one that is relatively common:

  error: /local/scratch is on the / partition.
  Please mail sys-admin for help.

This error happens on a system that hasn't been set up to allow local scratch in a separate partition (it is unwise to use the root partition for scratch as unpleasant things can happen if it gets full up). As the message says, you need assistance from sys-admin (probably, users of FC6 machines may be able to do this repartitioning for themselves).

Since the disc holding the data is directly connected to the computer you are using, access to data on a scratch disc is typically very fast.

Scratch space is as resilient as the disc (or discs) that supports it. Scratch space is ideally suited to bulky data that can be regenerated relatively easily.

System-wide scratch space

The department's central file server maintains "globally available" scratch space, known as bigdisc. You may ask for an allocation of such space by mail to sys-admin; when the request is granted, a directory is created for you, and a quota allocation made. The directory name contains the user's name, so user gsm10 would be allocated a directory

  /anfs/bigdisc/gsm10

Since the disc holding the data is remote from the computer you are using, data access is less fast than to local scratch (see Local scratch space). On the other hand, the data is accessible from any managed computer to which you have access; it is thus useful for common-access items such as local caches of CVS repositories, etc.

Your global scratch space is stored on the file server, with all the advantages that implies (see Integrity of filer disc space). The space is not, however, backed-up.

"Group" space

The name "group" comes from the way such space is categorised in "traditional" Unix; the space may be allocated for use by a research or administrative group of people, or it may be a container for things which are stored together for the systems administrators' convenience.

Group file space is stored on the file server, and accessed under Unix as:

/usr/groups/group-name

(just as with home directories, the directories don't exist a priori on your machine: they are created when you need them, by the auto-mounter).

On a Windows machine connected to the Lab's network, group space is available as

\\filer\groups\group-name

You may map that to a "network drive", or assign it as a "network place", as you choose.

Group space allocation is subject to quota, and is negotiated by the researcher (typically a project principal investigator) who arranges for the space in the first place. By contrast to home directory quotas, group space quotas apply to the container, rather than to the user who has access to the container.

If the initial estimate, of the amount of space needed, proves faulty, the "owner" of the space may attempt to renegotiate the quota (via mail to sys-admin).

Operating system disc space

This space is not normally accessible to users.

Summary of file space types

Type Availability Advantages Disadvantages
Operating system unavailable
Temporary local system only high speed; no backup; system deletes files; very little space
Global temporary most machines fairly generous amount of space; NFS slows access; no backup, system deletes files
Local scratch local system only high speed; no backup
Global scratch managed machines file server; NFS slows access; no backup
Home space most machines file server, backup; NFS slows access; quota restrictions quite tight
Group space most machines file server, backup; NFS slows access; quota restrictions quite tight

File system integrity

So what happens when things go wrong? - It depends on the type of file system you're using...

Integrity of workstation discs

To first order, workstation discs have no protection at all. If the disc fails, the data is lost; this is of course no problem as far as the operating system is concerned (it can easily be reloaded onto a new disc drive), but the /local/scratch* disc space will also be lost.

However, most "new" Linux workstations come with two system discs, and they are typically organised as a mirrored pair, on managed machines. On such machines, there's a chance that a broken disc can be replaced with rather little down time. In such cases, it is often possible to recover data in the scratch areas.

DO NOT rely on such chances. Assume that anything on /local/scratch* could be lost at any time, so do not put anything there unless it can be retrieved with no more than modest effort.

Integrity of filer disc space

The filer is a Redundant Array of Independent Discs, also known as a RAID array. We use double RAID, i.e. the filer can survive a double disc failure. The redundancy allows the filer to spot and correct really quite serious errors in the disks holding the files (behind the scenes, it tells system managers about such problems, so that potentially faulty discs can be replaced). The lab also has a second filer in a separate building which partially mirrors the main filer. Thus the file server (and its mirror) protects you from any "ordinary" disc failure; nothing short of a catastrophe which affects two widely separated buildings will actually lose your files.

Snapshots

In addition to its RAID facilities, the filer also offers "snapshots" of the state of directories. Snapshots are really useful when you find you've deleted something you didn't intend to delete...

For every directory that is maintained on the filer, data about "old" files is maintained; new data are compiled every hour. You can look at these data, as if they were organised as directories: these directories "contain" the files, as they were at the time the snapshot was taken.

So: each directory on the file server contains a directory whose name is .snapshot (under Unix), or ~snapshot (under Windows). This directory doesn't appear in directory listings or Windows folder displays, and it doesn't respond to filename completion (under Unix). List the contents of the snapshot directory, and you see something like:

drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 00:00 sv_daily.0
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 17 00:00 sv_daily.1
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 16 00:00 sv_daily.2
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 15 00:00 sv_daily.3
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 14 00:00 sv_daily.4
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 13 00:00 sv_daily.5
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 14:00 sv_hourly.0
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 13:00 sv_hourly.1
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 04:00 sv_hourly.10
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 03:00 sv_hourly.11
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 02:00 sv_hourly.12
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 01:00 sv_hourly.13
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 23:00 sv_hourly.14
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 22:00 sv_hourly.15
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 21:00 sv_hourly.16
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 20:00 sv_hourly.17
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 19:00 sv_hourly.18
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 18:00 sv_hourly.19
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 12:00 sv_hourly.2
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 17:00 sv_hourly.20
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 16:00 sv_hourly.21
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 11:00 sv_hourly.3
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 10:00 sv_hourly.4
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 09:00 sv_hourly.5
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 08:00 sv_hourly.6
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 07:00 sv_hourly.7
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 06:00 sv_hourly.8
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 19 05:00 sv_hourly.9
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 18 00:00 sv_weekly.0
drwxr-xr-x 113 ckh11 ckh11 49152 Mar 11 00:00 sv_weekly.1

(note, this was done under Linux, using 'ls -lu': this gives you the time each snapshot was made.)

As you can see, there were (when that listing was copied) 21 hourly snapshots, 6 daily snapshots (taken at midnight) and 2 weekly snapshots (taken on Sundays). Actually, an hourly snapshot, taken at midnight, is immediately renamed as a daily snapshot (so there may be fewer sv_hourly.* files). Similarly, a daily snapshot, taken on a Sunday, is immediately renamed as a weekly snapshot.

If you require a snapshot that is older than the main ones you can see, and which are likely to be available elsewhere, contact the Sys Admins, who may be able to help.

Using snapshots

When you look at a snapshot directory, you are experiencing something of the smoke-and-mirrors of the magician: there's no way to be entirely sure what you're seeing. As a result, there are some surprising things that might happen. One example is:

$ echo foo bar > zzz
$ cp .snapshot/sv_hourly.0/zzz zzz
cp: `.snapshot/sv_hourly.0/zzz' and `zzz' are the same file

where the act of corruption of file zzz hasn't changed the identity of the file.

Furthermore, while you actually own the snapshot directories, you can't write to them. So in particular, you can't mv files from them (under Unix), or cut-and-paste (under Windows).

In summary, then, retrieving files from a snapshot must always be a "copy" operation, and it's best to use a staging post, as in:

$ cp -p .snapshot/sv_hourly.0/zzz zzz.intermed
$ mv zzz.intermed zzz

System backup

The discussion above shows that the file server is remarkably resilient, and is very unlikely to lose your files. However, there remains the possibility that the (nearly) impossible might happen. There is also the possibility that you may delete one of your own files in error, and not realise the mistake until after the last snapshot copy of it has disappeared.

To deal with these contingencies we store further snapshots on our secondary filer. Unless you need something from the old tapes that pre-date these snapshots, recovery is not particularly time-consuming.

To ask for a file to be retrieved, mail a request to sys-admin with details of:

  • the file name;
  • the file's path;
  • the date the file was lost