This is an old revision of the document!

Using SLURM

SLURM is a resource manager and job scheduler for high-performance computing clusters. We use a job scheduler to ensure fair usage of the research-computing resources by all users, with hopes that no one user can monopolize the computing resources. Users who wish to use the cluster must "request" CPU time and possibly "queue" for resources.

Our SLURM is configured with the following job queues (also called "partitions" in SLURM):

debug
batch
highmem

"debug" is the default queue, which is useful for testing job parameters, program paths, etc. The run-time limit of the "debug" partition is 5 minutes, after which jobs are killed. The other partitions have no set time limit.

To see more information about the queue configuration, use sinfo -lNe.

[jbaka@compute03 ~]$ sinfo -lNe
Fri Feb  1 15:27:44 2019
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
compute2       1     batch        idle   64   64:1:1      1        0     10   (null) none                
compute03      1     batch       mixed    8    8:1:1      1        0      5   (null) none                
compute03      1   highmem       mixed    8    8:1:1      1        0      5   (null) none                
compute04      1     batch       mixed    8    8:1:1      1        0      5   (null) none                
hpc            1    debug*        idle    4    4:1:1      1        0      1   (null) none                
mammoth        1   highmem        idle    8    8:1:1      1        0     30   (null) none                
taurus         1     batch       mixed   64   64:1:1      1        0     20   (null) none

The above tells you, for instance, that compute04 has 8 CPUs while compute2 has 64 CPUs. And that a job sent to the "highmem" partition (a SLURM verb equivalent to "queue", as per the vocabulary in use with other schedulers, e.g. Sun Grid Engine), then it will end up being run on either compute03 or mammoth.

Submitting jobs

Interactive jobs

How to get an interactive session, i.e. when you want to interact with a program (like R, etc) for a limited amount of time, making the scheduler aware that you are requesting/using resources on the cluster:

[aorth@hpc: ~]$ interactive 
salloc: Granted job allocation 1080
[aorth@taurus: ~]$

NB: interactive jobs have a time limit of 8 hours: if you need more, then you should write a batch script.

You can also open an interactive session on a specific node of the cluster by specifying it through the -w commandline argument:

[jbaka@hpc ~]$ interactive -w compute03
salloc: Granted job allocation 16349
[jbaka@compute03 ~]$

Batch jobs

Request 4 CPUs for a NCBI BLAST+ job in the batch partition. Create a file blast.sbatch:

#!/usr/bin/env bash
#SBATCH -p batch
#SBATCH -J blastn
#SBATCH -n 4

# load the blast module
module load blast/2.6.0+

# run the blast with 4 CPU threads (cores)
blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4

Submit the script with sbatch:

$ sbatch blast.sbatch 
Submitted batch job 1082

Batch job using local storage

Users' home folders are mounted over the network (on "wingu"), so when you're on mammoth or taurus any time you write to the disk (ie job output) has to go round trip over the network.

Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example:

#!/usr/bin/env bash
#SBATCH -p batch
#SBATCH -n 4
#SBATCH -J blastn

# load the blast module
module load blast/2.2.30+

WORKDIR=/var/scratch/$USER/$SLURM_JOBID
mkdir -p $WORKDIR

echo "Using $WORKDIR on $SLURMD_NODENAME"
echo

# change to working directory on compute node
cd $WORKDIR

# run the blast with 4 CPU threads (cores)
blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4 -out blast.out

All output is directed to $WORKDIR/, which is the temporary folder on the compute node. See these slides from HPC Users Group #3 for more info.

Check queue status

squeue is the command to use to get more information about the different jobs that are running on the cluster, waiting in a queue for resources to become available, or halted for some reason:

[jbaka@compute03 ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             16330     batch interact  pyumbya  R    6:33:26      1 taurus
             16339     batch interact ckeambou  R    5:19:07      1 compute04
             16340     batch interact ckeambou  R    5:12:52      1 compute04
             16346     batch velvet_o  dkiambi  R    1:39:09      1 compute04
             16348     batch interact fkibegwa  R      22:38      1 taurus
             16349     batch interact    jbaka  R       3:27      1 compute03

In addition to the information above, it is sometimes useful to know what is the number of CPUs (computing cores) allocated to each job: the scheduler will queue jobs asking for resources that aren't available, most often because the other jobs are eating up all the CPUs available on the host. To get the number of CPUs for each job and display the whole thing nicely, the command is slightly more involved:

[jbaka@compute03 ~]$ squeue -o"%.7i %.9P %.16j %.8u %.2t %.10M %.6D %10N %C"
  JOBID PARTITION             NAME     USER ST       TIME  NODES NODELIST   CPUS
  16330     batch      interactive  pyumbya  R    6:40:52      1 taurus     1
  16339     batch      interactive ckeambou  R    5:26:33      1 compute04  1
  16340     batch      interactive ckeambou  R    5:20:18      1 compute04  1
  16346     batch velvet_out_ra_10  dkiambi  R    1:46:35      1 compute04  2
  16348     batch      interactive fkibegwa  R      30:04      1 taurus     1
  16349     batch      interactive    jbaka  R      10:53      1 compute03  1

or, alternatively:

[jbaka@compute03 ~]$ squeue -O username,jobid,name,nodelist,numcpus
USER                JOBID               NAME                NODELIST            CPUS                
pyumbya             16330               interactive         taurus              1                   
ckeambou            16339               interactive         compute04           1                   
ckeambou            16340               interactive         compute04           1                   
dkiambi             16346               velvet_out_ra_109_vecompute04           2                   
fkibegwa            16348               interactive         taurus              1                   
jbaka               16349               interactive         compute03           1

ILRI Research Computing

Table of Contents