Using the cluster

Queues and limits

You can submit jobs to any of the following SLURM partitions (otherwise known as queues):

  • small-short
    Jobs requiring up to 24 cores on the same node.
    Maximum run-time 2 days.
  • small-long
    Jobs requiring up to 24 cores on the same node.
    Maximum run-time 30 days.
  • large-short
    Jobs requiring more than 24 cores, which may be distributed across multiple nodes. There is no upper limit on the number of nodes (other than how many there are), but remember that you are using a shared resource. Maximum run-time 2 days.
  • large-long
    Jobs requiring more than 24 cores, which may be distributed across multiple nodes. There is no upper limit on the number of nodes (other than how many there are), but remember that you are using a shared resource. Maximum run-time 30 days.
  • bigmem
    Jobs requesting one or both of the cluster's large memory nodes (2TB per node). Maximum run-time 30 days.
  • vbigmem
    Jobs requesting the cluster's large memory node (4TB). Maximum run-time 30 days.
  • gpu.L40S
    Jobs requesting one or more of the cluster's GPU nodes with NVIDIA L40S GPUs (8 per node). Maximum run-time 4 days. Note that you still need to request the GPUs with the --gres flag. The following line would request two GPUs on a node:
    #SBATCH --gres=gpu:2
  • gpu.A100
    Jobs requesting one or more of the cluster's GPU nodes with NVIDIA A100 GPUs (4 per node). Maximum run-time 4 days. See above (gpu.L40S) for how to request GPUs
  • gpu.A30
    Jobs requesting the GPU node with two NVIDIA A30 GPUs. Maximum run-time 4 days. See above (gpu.L40S) for how to request GPUs
  • debug
    Small and short jobs, usually meant for tests or debugging. This partition is limited 24 cores and a maximum of two hours run-time.
  • interactive
    Meant for interactive jobs requiring a graphical user interface, particularly those requiring a node-locked license. Limited to 24 cores and 4 days.
  • Teaching related partitions
    Occasionally there will be separate partitions with the name of a teaching module. They are only to be used in connection with that module.

How to find out about the state of partitions
The current state of partitions can be shown with the command
sinfo -s

It produces a one-line description of every partition. Under the heading 
NODES(A/I/O/T)
there are four numbers, which give the numbers of nodes in state Allocated, Idle (i.e. free), Other (unavailable; usually drained or down), and the Total number of nodes in this partition. Note that an "allocated" node has at least one job running, which does not automatically mean that it is fully used and will not accept additional jobs. 

Submitting and monitoring jobs 

Calculations need to be submitted to the SLURM queues and will be scheduled to run on the requested compute nodes when those become available. A job script is submitted using the sbatch command:

sbatch <job-script>

Example job scripts for different programs can be found in the directory /usr/local/examples. Please copy one of those and adjust it to your requirements. 

You can monitor submitted batch jobs with the squeue command. On its own, it will show all jobs in the queues at the moment. To show only your own, use it with the -u flag

squeue -u <user>

The output will be similar to the following:

  JOBID  PARTITION NAME USER ST TIME   NODES NODELIST(REASON)
 1002462     large B6N3_opt     akg5  R 3-14:41:13      1 compute067
 1002556     large SAKG_5 E     akg5  R 2-03:38:21      1 compute066
 1002654     small SKG_7a E     akg5  R    4:48:38      1 compute066

The meaning of the different fields is as follows:

JOBID: Job number. You need this to find more information about the job or to cancel it (see below).
PARTITION: queue.
NAME: Job name as given with the -J flag in the sbatch command or the #SBATCH -J directive in the job script.
USER: The owner of the job.
ST: Status: R (running), PD (pending, i.e.waiting) or CG (changing, i.e starting or finishing).
TIME: Run-time so far.
NODES: Number of nodes requested.
NODELIST(REASON): Nodes in use (if running); reason for not running otherwise:
  (Resources): waiting for the requested nodes.
  (Priority): There are other jobs with higher priority.

A batch job can be deleted with the command

scancel <jobid>

If it is already running, it will be stopped.

Example job

Here is a typical batch job:

#!/bin/bash
#SBATCH --job-name=castep-test
#SBATCH --partition=large-short
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64
#SBATCH --output=%x_%j.log

# Load required modules
module unload mpi
module load openmpi/5.0.5
module load ucx/1.16.0

export LD_LIBRARY_PATH=/software/GCCLIBS/lib/:$LD_LIBRARY_PATH

# Run vasp
srun --mpi=pmix_v3 /software/VASP/vasp.5.4.4.pl2/bin/vasp_gam

The Job script consists of two sections: SLURM directives determining the requested resources and the job's appearance in the queue, and commands to be executed once the job is running. This job will show up as "vasp5" in the squeue output. It requests two nodes and 32 cores (the maximum available) cores on each node, and will run in the parallel partition. The execution in this case consists of setting up path and environment for a specific compiler/MPI combination by loading the correct modules, and then the command to run an application in parallel using MPI.

This example is using VASP.  Note that this is commercial software. Do not use VASP unless your group has a license!

Installing your own software

If you require software that needs to be installed in a location other than your home directory, or that may be of use to other users, you can request its installation in a publicly accessible location. Your fellow users will thank you for it. It is, however, possible and allowed to install software in your own directory. It is your responsibility to ensure that software you install and its use is legal, ethical and used for the purposes of your research. This means in particular that licensed software is only used within the conditions of its license. The method of installation will depend on the software in question, but the most commonly used approaches are CONDA or compilation from source.

CONDA

Many popular open source Python (and other) packages are available within the CONDA framework. Hypatia provides a location for installing your packages in the directory /gpfs01/software/conda/<user>. The command for setting up your initial environment is

install-conda

Accept all defaults. It will ask permission to add its initialisation to your .bashrc file. This is OK for most purposes and simplifies future use.

If you need multiple packages with incompatible dependencies, please familiarise yourself with conda environments.

Compiling your own software

Software distributed as source code (Fortran, C or C++, and possibly parallelised using MPI) needs to be compiled before execution. Several compilers and MPI versions are installed. For most purposes, the GNU compilers, in combination with Open MPI, give the most efficient and reliable executables. The compilers are in your default search path. You set the environment for MPI by loading the appropriate modules. Here is an example:

# Set up environment for Open MPI with gnu compilers
module load openmpi/5.0.5
module load ucx/1.16.0
export UCX_WARN_UNUSED_ENV_VARS=n
# Set path for OpenBLAS, fftw3 and hdf5
export LD_LIBRARY_PATH=/software/GCCLIBS/lib/:$LD_LIBRARY_PATH

This also shows how to set the library path for OpenBLAS and other commonly used libraries.

A common problem with older Fortran MPI programs is a compilation failure because of "argument mismatch". If this happens, please add the flag "-fallow-argument-mismatch" to the Fortran flags in your makefile. Please ask if you need advice on this.

Running an Interactive Job on Hypatia

Interactive jobs give you a shell on a compute node so you can run commands directly (e.g. testing software, compiling, short experiments). Use them responsibly and always clean up when you are done.

  1. Login to the login node.

    ssh <username>@hypatia.st-andrews.ac.uk

     

  2. Start a shell on the allocated compute node.

    srun --pty --x11 -p <partition> -n <number of cores> bash

    See "man srun" for all available options. Your prompt should now reflect the compute node hostname.

  3. Run your commands on the compute node.

  4. When you are finished, exit the compute-node shell. 

    exit

Don't hog resources unless you need them. Your requested cores will remain unavailable to others until you exit your interactive shell.

Important: Avoid running heavy workloads on the login node. Always use an interactive job or a batch job for compute-intensive work. You can use any partition (not just "interactive") for interactive jobs, as long as there are resources available in that partition.

Using Jupyter Notebooks from your desktop browser

  1. Login to the login node.

    ssh <username>@hypatia.st-andrews.ac.uk
  2. Create an interactive job (see above).

    salloc --partition=interactive --ntasks=1 --cpus-per-task=N
  3. Run a bash shell on the compute node.

    srun --pty bash
  4. Create a python virtual environment.

    python3 -m venv env
  5. Install Jupyter.

    pip3 install jupyter
  6. Run a Jupyter Lab server on the compute node.

    jupyter lab --no-browser --ip=127.0.0.1
  7. Copy the URL that Jupyter outputs at the end of its output, something like:

    http://127.0.0.1:8888/lab?token=206c7fc5b387a370422eceea1d79a023a90641dcb37ef607

    Paste it into a notepad (somewhere you can access in a moment).

  8. Open a new terminal on your local machine and enter:

    ssh -N -L 8888:localhost:8888 -J username@hypatia.st-andrews.ac.uk username@compute090

    Enter any passwords when prompted. You should now be able to use the link you copied by pasting it into your local browser.

Note: At the end of the session you must close your Jupyter server and then cancel your interactive job. First find the job number:

squeue -u $USER

Then cancel the job:

scancel <job_id>