Please note this information was current as of October 2020. Please see the HPC documentation for latest info. Also, these notes are specifically targeted at students doing projects with the NLIP Group at the Cambridge Computer Lab. See also this wiki page about other options for accessing GPU.
To sign up for the Cambridge Service for Data Driven Discovery (CSD3), aka “HPC”, complete the online application form (Raven login).
Notes:
Once your application has been approved (can take up to a week, usually faster), you can connect to HPC servers like so:
ssh <username>@login.hpc.cam.ac.uk
Where “username” is your CRSid and you’ll be prompted for your password (HPC by default uses your UIS password, the one you use for email, Raven, etc). Note that this is actually bad practice security-wise. Set your HPC password distinctly from UIS password using the passwd
command.
There are 16 log-in nodes, GPU from 1-8, CPU from 9-16, all running Scientific Linux 7 OS.
Note that the log-in nodes are not intended to run your experiments but only for environment set-up, experiment preparation, SLURM workload management.
The HPC functions through modules and virtual environments. See the default modules with module list
See a list of available modules: module avail
Load a module module load <module>
: e.g. module load python/3.8
(note there are finer-grained versions such as ‘python-3.6.1-gcc-5.4.0-xk7ym4l’) or several at once: module load python/3.8 R/3.6
Unload a module module unload <module>
(can auto-complete), or all modules with module purge
(note this unloads the useful SLURM modules too, use with care).
If you’re working with Python, it’s best to work in a virtual environment: python3 -m venv venvs/demo; source venvs/demo/bin/activate
Update pip and other fundamentals: pip install --upgrade pip; pip install --upgrade numpy scipy wheel; pip install tensorflow==1.15
(or tensorflow-gpu)
Obviously you can set up a conda environment too, for a more general virtual environment (see the HPC page about this, noting that you use miniconda rather than anaconda), e.g.:
module load miniconda/3
conda create --prefix ./myenv
source activate ./myenv
Restrictions:
Now for a GED experiment with Marek Rei’s sequence labeller. First make an experiment directory: mkdir my_expt; cd my_expt
git clone https://github.com/marekrei/sequence-labeler.git
cd sequence-labeler
mkdir data embeddings models
scp fce-public.* login.hpc.cam.ac.uk:my_expt/sequence-labeler/data/; ls -lh data/
cd embeddings; wget http://nlp.stanford.edu/data/glove.6B.zip; unzip glove.6B.zip; rm glove.6B.zip
cd ../conf/; emacs fcepublic.conf
(update data paths and embeddings path to be ‘sequence-labeler/data/…, sequence-labeler/embeddings/…’, etc)cd ../../
Next to prepare your SLURM script: see example scripts here ls -l /usr/local/Cluster-Docs/SLURM
. Copy and edit the script where indicated, e.g. for a CPU job: cp -v /usr/local/Cluster-Docs/SLURM/slurm_submit.peta4-skylake slurm_submit; emacs slurm_submit
(copy the wilkes2 script for a GPU job).
$ mybalance
), or BUTTERY-SL3-GPU if a GPU job, using the free tier to test or run low-priority jobs--mail-type
to ALL
--no-requeue
source ~/venvs/demo/bin/activate
application="python sequence-labeler/experiment.py sequence-labeler/conf/fcepublic.conf"
options=">logfile 2>errfile"
Now to submit and monitor your job:
mybalance
sbatch slurm_submit
showq -u
scancel <job_id>
ls slurm-NNNN.out
(where NNNN is the job id)mybalance
To go further with the HPC, check out Chris Davis’s page (contact ccd38 with your github username for access to the page) about using GPUs with batch jobs. Also be aware of the extensive documentation and the support helpdesk’s email address: support@hpc.cam.ac.uk
NB, in particular it may be useful to know your storage limits: https://docs.hpc.cam.ac.uk/hpc/user-guide/io_management.html
Andrew Caines, apc38, October 2020