Please note this information was current as of October 2020. Please see the HPC documentation for latest info. Also, these notes are specifically targeted at students doing projects with the NLIP Group at the Cambridge Computer Lab. See also this wiki page about other options for accessing GPU.

Registration

To sign up for the Cambridge Service for Data Driven Discovery (CSD3), aka “HPC”, complete the online application form (Raven login).

Notes:

Log-in

Once your application has been approved (can take up to a week, usually faster), you can connect to HPC servers like so:

ssh <username>@login.hpc.cam.ac.uk

Where “username” is your CRSid and you’ll be prompted for your password (HPC by default uses your UIS password, the one you use for email, Raven, etc). Note that this is actually bad practice security-wise. Set your HPC password distinctly from UIS password using the passwd command.

There are 16 log-in nodes, GPU from 1-8, CPU from 9-16, all running Scientific Linux 7 OS.

Note that the log-in nodes are not intended to run your experiments but only for environment set-up, experiment preparation, SLURM workload management.

Modules

The HPC functions through modules and virtual environments. See the default modules with module list

See a list of available modules: module avail

Load a module module load <module>: e.g. module load python/3.8 (note there are finer-grained versions such as ‘python-3.6.1-gcc-5.4.0-xk7ym4l’) or several at once: module load python/3.8 R/3.6

Unload a module module unload <module> (can auto-complete), or all modules with module purge (note this unloads the useful SLURM modules too, use with care).

Your environment

If you’re working with Python, it’s best to work in a virtual environment: python3 -m venv venvs/demo; source venvs/demo/bin/activate

Update pip and other fundamentals: pip install --upgrade pip; pip install --upgrade numpy scipy wheel; pip install tensorflow==1.15 (or tensorflow-gpu)

Obviously you can set up a conda environment too, for a more general virtual environment (see the HPC page about this, noting that you use miniconda rather than anaconda), e.g.:

module load miniconda/3
conda create --prefix ./myenv
source activate ./myenv

Jobs

Restrictions:

Now for a GED experiment with Marek Rei’s sequence labeller. First make an experiment directory: mkdir my_expt; cd my_expt

  1. git clone https://github.com/marekrei/sequence-labeler.git
  2. cd sequence-labeler
  3. mkdir data embeddings models
  4. from local: scp fce-public.* login.hpc.cam.ac.uk:my_expt/sequence-labeler/data/; ls -lh data/
  5. pre-trained GloVe embeddings from Stanford NLP website: cd embeddings; wget http://nlp.stanford.edu/data/glove.6B.zip; unzip glove.6B.zip; rm glove.6B.zip
  6. edit config file: cd ../conf/; emacs fcepublic.conf (update data paths and embeddings path to be ‘sequence-labeler/data/…, sequence-labeler/embeddings/…’, etc)
  7. exit and save, back to slurm script: cd ../../

Next to prepare your SLURM script: see example scripts here ls -l /usr/local/Cluster-Docs/SLURM. Copy and edit the script where indicated, e.g. for a CPU job: cp -v /usr/local/Cluster-Docs/SLURM/slurm_submit.peta4-skylake slurm_submit; emacs slurm_submit (copy the wilkes2 script for a GPU job).

  1. change job name, e.g. ‘pudding’
  2. project to be charged? e.g. BUTTERY-SL3-CPU (see project names: $ mybalance), or BUTTERY-SL3-GPU if a GPU job, using the free tier to test or run low-priority jobs
  3. how many nodes? 1 if in doubt
  4. how many tasks? 1 if in doubt
  5. (GPUs per node: note you are charged by GPU usage!)
  6. time required? hh:mm:ss, or HPC max if in doubt
  7. change --mail-type to ALL
  8. leave as --no-requeue
  9. add your virtual environment: source ~/venvs/demo/bin/activate
  10. insert application command(s): application="python sequence-labeler/experiment.py sequence-labeler/conf/fcepublic.conf"
  11. with logging: options=">logfile 2>errfile"
  12. save and exit

Now to submit and monitor your job: