This is an old revision of the document!
Table of Contents
Using SLURM
SLURM is a resource manager and job scheduler for high-performance computing clusters. We use a job scheduler to ensure fair usage of the research-computing resources by all users, with hopes that no one user can monopolize the computing resources. Users who wish to use the cluster must "request" CPU time and possibly "queue" for resources.
Our SLURM is configured with the following job queues (also called "partitions" in SLURM):
- debug
- batch
- highmem
"debug" is the default queue, which is useful for testing job parameters, program paths, etc. The run-time limit of the "debug" partition is 5 minutes, after which jobs are killed.
To see more information about the queue configuration, use sinfo -lNe
.
Submitting jobs
Interactive jobs
How to get an interactive session, ie when you want to interact with a program (like R, etc):
[aorth@hpc: ~]$ interactive salloc: Granted job allocation 1080 [aorth@taurus: ~]$
NB: interactive jobs have a time limit of 8 hours, if you need more then you should write a batch script.
Batch jobs
Request 4 CPUs for a NCBI BLAST+ job in the batch
partition. Create a file blast.sbatch:
#!/usr/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 # load the blast module module load blast/2.6.0+ # run the blast with 4 CPU threads (cores) blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4
Submit the script with sbatch
:
$ sbatch blast.sbatch Submitted batch job 1082
Batch job using local storage
Users' home folders are mounted over the network (on "wingu"), so when you're on mammoth or taurus any time you write to the disk (ie job output) has to go round trip over the network.
Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example:
#!/usr/bin/env bash #SBATCH -p batch #SBATCH -n 4 #SBATCH -J blastn # load the blast module module load blast/2.2.30+ WORKDIR=/var/scratch/$USER/$SLURM_JOBID mkdir -p $WORKDIR echo "Using $WORKDIR on $SLURMD_NODENAME" echo # change to working directory on compute node cd $WORKDIR # run the blast with 4 CPU threads (cores) blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4 -out blast.out
All output is directed to $WORKDIR/
, which is the temporary folder on the compute node. See these slides from HPC Users Group #3 for more info.
Check queue status
squeue
Receive mail notifications
To receive mail notifications about the state of your job, add the following lines to your sbatch script: whereby <EMAIL_ADDRESS> is your email address
#SBATCH --mail-user <EMAIL_ADDRESS> #SBATCH --mail-type ALL
Notification mail types(–mail-type) can be BEGIN, END, FAIL, REQUEUE and ALL(any state change).
Example:
#SBATCH --mail-user J.Doe@cgiar.org #SBATCH --mail-type ALL