Computer Laboratory

The High Performance Computing Service (HPCS)

The University High Performance Computing Service (HPCS) provides a means of running a number of parallel jobs on potentially large amounts of data.

This page is intended as a quick introduction for Computer Laboratory members, but for more detail see The High Performance Computing Service.

The HPCS consists of large number of compute nodes (the Darwin cluster), a smaller number of GPU nodes (the Wilkes cluster), and a small number of login servers. Unless working on something that really needs a GPU, lab members would be expected to make use of Darwin. Note that the HPCS is under constant development, and anything said here about numbers of nodes, quotas or time allocation is likely to go out of date - see the HPCS documentation for definitive values.

Free vs Paying

Users can fall into several Service Levels: SL2 is for paying customers working on medium scale projects, making irregular use, while SL3 is for non-paying customers making small-scale use with certain running time and priority constraints. (There are also SL1 and SL4 users, but neither will be of interest to lab users.) SL3 users get a certain allocation of hours per quarter, SL2 users get what they pay for, for as long as their money lasts. Quotas are reset on 4 fixed dates per year.

The Computer Lab has given some money to the HPCS to allow Lab members to use the service as an SL2 user. This money makes up two "Projects", by the name of COMPUTERLAB-SL2 (for running on Darwin) and COMPUTERLAB-SL2-GPU (for running on Wilkes). When you run a job you specify which Project you want in the submission script, and the hours you use will be charged to that Project. It is not a huge amount of money, so it is mainly intended for people wanting to try out the HPCS to see if it is suitable for their needs, or for student projects. If you intend to make extensive use of the HPCS you should ask your PI to set up their own project and provide funding. If you do not charge to a Project, you will receive Service Level 3, which has more restricted running time and priority than Project funded service, but should still be useable. Use of a Project requires approval, in the case of the "Computer Lab" Projects, COMPUTERLAB-SL2 or COMPUTERLAB-SL2-GPU, it will need to be approved by somebody on the sys-admin team.

Applying for an account

Fill out and submit the HPC application form. It may take up to a week for an account to be issued, but is usually much faster (next day). Unless your PI has, or wishes to, set up a project of their own, specify either COMPUTERLAB-SL2 (for running on Darwin) or COMPUTERLAB-SL2-GPU (for running on Wilkes, ie you have a need for GPUs) if asked for a project.

Operating Procedure

A user will have 2 directories: a local space with quota 40Gb and a scratch space with quota 1Tb (as of September 2016).

Without too much detail, the usual working procedure is as follows:

  • login to one of the login servers: login.hpc.cam.ac.uk (a set of 8 machines),
  • copy your program/script into the local directory,
  • copy your data (if any) into the scratch space,
  • on the login server, compile your program if necessary and check that it runs on a simple case,
  • set up a submission script, which specifies how many instances of the program will run on which data (this is the only difficult part, refer to the HPCS documentation on the UIS webpage),
  • submit the submission script to the scheduler,
  • wait for it to run (you can check its progress). Jobs can take up to 36 hours and there is no preemption, so you could potentially have to wait 36 hours for other jobs to run before your job can start, although that would require someone else to be using the entire pool of compute nodes all at once; in practice, this happens very rarely. SL2 users have higher priority than SL3 users.

Software

A wide variety of software is available: Matlab, R, Java, a variety of compilers. If you have specific version constraints, or need specific toolboxes then it is best to check availability with [Javascript required] before applying for an account.

Training

There are periodic 1-day training courses in how to use the HPCS. However, someone familiar with other batch systems, such as Condor, should be able to pick up enough from the HPCS documentation to get by. If you are new to such systems then it may be quite a steep learning curve, and you should allow several days and quite a lot of trial and error. The HPCS support staff are extremely helpful and will be of considerable assistance, and there is extensive documentation available on the UIS webpage.