SLURM is a resource manager and job scheduler for high-performance computing clusters. We use a job scheduler to ensure fair usage of the research-computing resources by all users, with hopes that no one user can monopolize the computing resources. Users who wish to use the cluster must "request" CPU time and possibly "queue" for resources.
Our SLURM is configured with the following job queues (also called "partitions" in SLURM):
"debug" is the default queue, which is useful for testing job parameters, program paths, etc. The run-time limit of the "debug" partition is 5 minutes, after which jobs are killed. The other partitions have no set time limit.
To see more information about the queue configuration, use
[jbaka@compute03 ~]$ sinfo -lNe Fri Feb 1 15:27:44 2019 NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON compute2 1 batch idle 64 64:1:1 1 0 10 (null) none compute03 1 batch mixed 8 8:1:1 1 0 5 (null) none compute03 1 highmem mixed 8 8:1:1 1 0 5 (null) none compute04 1 batch mixed 8 8:1:1 1 0 5 (null) none hpc 1 debug* idle 4 4:1:1 1 0 1 (null) none mammoth 1 highmem idle 8 8:1:1 1 0 30 (null) none taurus 1 batch mixed 64 64:1:1 1 0 20 (null) none
The above tells you, for instance, that compute04 has 8 CPUs while compute2 has 64 CPUs. And that a job sent to the "highmem" partition (a SLURM verb equivalent to "queue", as per the vocabulary in use with other schedulers, e.g. Sun Grid Engine), then it will end up being run on either compute03 or mammoth.
How to get an interactive session, i.e. when you want to interact with a program (like R, etc) for a limited amount of time, making the scheduler aware that you are requesting/using resources on the cluster:
[aorth@hpc: ~]$ interactive salloc: Granted job allocation 1080 [aorth@taurus: ~]$
NB: interactive jobs have a time limit of 8 hours: if you need more, then you should write a batch script.
You can also open an interactive session on a specific node of the cluster by specifying it through the
-w commandline argument:
[jbaka@hpc ~]$ interactive -w compute03 salloc: Granted job allocation 16349 [jbaka@compute03 ~]$
Request 4 CPUs for a NCBI BLAST+ job in the
batch partition. Create a file blast.sbatch:
#!/usr/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 # load the blast module module load blast/2.6.0+ # run the blast with 4 CPU threads (cores) blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4
Submit the script with
$ sbatch blast.sbatch Submitted batch job 1082
Users' home folders are mounted over the network (on "wingu"), so when you're on mammoth or taurus any time you write to the disk (ie job output) has to go round trip over the network.
Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example:
#!/usr/bin/env bash #SBATCH -p batch #SBATCH -n 4 #SBATCH -J blastn # load the blast module module load blast/2.2.30+ WORKDIR=/var/scratch/$USER/$SLURM_JOBID mkdir -p $WORKDIR echo "Using $WORKDIR on $SLURMD_NODENAME" echo # change to working directory on compute node cd $WORKDIR # run the blast with 4 CPU threads (cores) blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4 -out blast.out
All output is directed to
$WORKDIR/, which is the temporary folder on the compute node. See these slides from HPC Users Group #3 for more info.
squeue is the command to use in order to get information about the different jobs that are running on the cluster, waiting in a queue for resources to become available, or halted for some reason:
[jbaka@compute03 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16330 batch interact pyumbya R 6:33:26 1 taurus 16339 batch interact ckeambou R 5:19:07 1 compute04 16340 batch interact ckeambou R 5:12:52 1 compute04 16346 batch velvet_o dkiambi R 1:39:09 1 compute04 16348 batch interact fkibegwa R 22:38 1 taurus 16349 batch interact jbaka R 3:27 1 compute03
In addition to the information above, it is sometimes useful to know what is the number of CPUs (computing cores) allocated to each job: the scheduler will queue jobs asking for resources that aren't available, most often because the other jobs are eating up all the CPUs available on the host. To get the number of CPUs for each job and display the whole thing nicely, the command is slightly more involved:
[jbaka@compute03 ~]$ squeue -o"%.7i %.9P %.16j %.8u %.2t %.10M %.6D %10N %C" JOBID PARTITION NAME USER ST TIME NODES NODELIST CPUS 16330 batch interactive pyumbya R 6:40:52 1 taurus 1 16339 batch interactive ckeambou R 5:26:33 1 compute04 1 16340 batch interactive ckeambou R 5:20:18 1 compute04 1 16346 batch velvet_out_ra_10 dkiambi R 1:46:35 1 compute04 2 16348 batch interactive fkibegwa R 30:04 1 taurus 1 16349 batch interactive jbaka R 10:53 1 compute03 1
[jbaka@compute03 ~]$ squeue -O username,jobid,name,nodelist,numcpus USER JOBID NAME NODELIST CPUS pyumbya 16330 interactive taurus 1 ckeambou 16339 interactive compute04 1 ckeambou 16340 interactive compute04 1 dkiambi 16346 velvet_out_ra_109_vecompute04 2 fkibegwa 16348 interactive taurus 1 jbaka 16349 interactive compute03 1