User Tools

Site Tools


using-slurm

Using SLURM

SLURM is a resource manager and job scheduler for high-performance computing clusters. We use a job scheduler to ensure fair usage of the research-computing resources by all users, with hopes that no one user can monopolize the computing resources. Users who wish to use the cluster must "request" CPU time and possibly "queue" for resources.

Our SLURM is configured with the following job queues (also called "partitions" in SLURM):

  • debug
  • batch
  • highmem

"debug" is the default queue, which is useful for testing job parameters, program paths, etc. The run-time limit of the "debug" partition is 5 minutes, after which jobs are killed. The other partitions have no set time limit.

To see more information about the queue configuration, use sinfo -lNe.

$ sinfo -lNe
Thu Aug 04 15:08:48 2022
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
compute03      1   highmem       idle  8       2:4:1 322249        0     10   (null) none
compute05      1     batch       mixed 48     2:24:1 386500        0     10   (null) none
compute06      1     batch       mixed 64     2:32:1 257491        0      5   (null) none
compute07      1   highmem        idle 8       1:8:1 101956        0      5   (null) none
hpc            1    debug*        idle 4       1:4:1 128876        0      1   (null) none

The above tells you, for instance, that compute06 has 64 CPUs. And that a job sent to the "highmem" partition (a SLURM verb equivalent to "queue", as per the vocabulary in use with other schedulers, e.g. Sun Grid Engine), then it will end up being run on either compute03 or compute07.

Submitting jobs

Interactive jobs

How to get an interactive session, i.e. when you want to interact with a program (like R, etc) for a limited amount of time, making the scheduler aware that you are requesting/using resources on the cluster:

[aorth@hpc: ~]$ interactive 
salloc: Granted job allocation 1080
[aorth@compute05: ~]$

NB: interactive jobs have a time limit of 8 hours: if you need more, then you should write an sbatch script.

You can also open an interactive session on a specific node of the cluster by specifying it through the -w commandline argument:

[jbaka@hpc ~]$ interactive -w compute03
salloc: Granted job allocation 16349
[jbaka@compute03 ~]$

Batch jobs

We are writing a SLURM script below. The parameters in its header request 4 CPUs for in the batch partition, and name our job "blastn". This name is only used internally by SLURM for reporting purposes. So let's go ahead and ceate a file blast.sbatch:

#!/usr/bin/bash -l
#SBATCH -p batch
#SBATCH -J blastn
#SBATCH -n 4

# load the blast module
module load blast/2.6.0+

# run the blast with 4 CPU threads (cores)
blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4

In the above, please DO NOT FORGET the '-l' option on the first ("sha-bang") line, as it is compulsory for correct interpretation of the module load commands.

We then submit the script with the sbatch command:

$ sbatch blast.sbatch 
Submitted batch job 1082

Batch job using local storage

Users' home folders are mounted over the network (on "wingu"), so when you're on mammoth or taurus any time you write to the disk (ie job output) has to go round trip over the network.

Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example:

#!/usr/bin/bash -l
#SBATCH -p batch
#SBATCH -J blastn
#SBATCH -n 4

# load the blast module
module load blast/2.2.30+

WORKDIR=/var/scratch/$USER/$SLURM_JOBID
mkdir -p $WORKDIR

echo "Using $WORKDIR on $SLURMD_NODENAME"
echo

# change to working directory on compute node
cd $WORKDIR

# run the blast with 4 CPU threads (cores)
blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4 -out blast.out

All output is directed to $WORKDIR/, which is the temporary folder on the compute node. See these slides from HPC Users Group #3 for more info.

Check queue status

squeue is the command to use in order to get information about the different jobs that are running on the cluster, waiting in a queue for resources to become available, or halted for some reason:

$ squeue 
  JOBID PARTITION               NAME     USER ST       TIME   CPUS NODELIST(REASON)
 746596     batch          structure    aorth  R 5-15:27:10      1 compute06
 746597     batch          structure    aorth  R 5-13:49:37      1 compute06
 746885     batch    model-selection    jjuma  R 4-20:45:15      8 compute06
 746998     batch        interactive  afeleke  R      30:09      1 compute06
 746999     batch             blastp    aorth  R       7:20      6 compute05
using-slurm.txt · Last modified: 2022/11/03 11:38 by jean-baka