User Tools

Site Tools


using-slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
using-slurm [2017/06/07 06:28] aorthusing-slurm [2022/11/03 11:38] (current) jean-baka
Line 8: Line 8:
   * highmem   * highmem
  
-"debug" is the default queue, which is useful for testing job parameters, program paths, etc. The run-time limit of the "debug" partition is 5 minutes, after which jobs are killed.+"debug" is the default queue, which is useful for testing job parameters, program paths, etc. The run-time limit of the "debug" partition is 5 minutes, after which jobs are killed. The other partitions have no set time limit.
  
 To see more information about the queue configuration, use ''sinfo -lNe''. To see more information about the queue configuration, use ''sinfo -lNe''.
 +
 +<code>$ sinfo -lNe
 +Thu Aug 04 15:08:48 2022
 +NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
 +compute03      1   highmem       idle  8       2:4:1 322249        0     10   (null) none
 +compute05      1     batch       mixed 48     2:24:1 386500        0     10   (null) none
 +compute06      1     batch       mixed 64     2:32:1 257491        0      5   (null) none
 +compute07      1   highmem        idle 8       1:8:1 101956        0      5   (null) none
 +hpc            1    debug*        idle 4       1:4:1 128876        0      1   (null) none</code>
 +
 +The above tells you, for instance, that compute06 has 64 CPUs. And that a job sent to the "highmem" partition (a SLURM verb equivalent to "queue", as per the vocabulary in use with other schedulers, e.g. Sun Grid Engine), then it will end up being run on either compute03 or compute07. 
  
 ===== Submitting jobs ===== ===== Submitting jobs =====
 ==== Interactive jobs ==== ==== Interactive jobs ====
-How to get an interactive session, ie when you want to interact with a program (like R, etc):+How to get an interactive session, i.e. when you want to interact with a program (like R, etc) for a limited amount of time, making the scheduler aware that you are requesting/using resources on the cluster:
 <code>[aorth@hpc: ~]$ interactive  <code>[aorth@hpc: ~]$ interactive 
 salloc: Granted job allocation 1080 salloc: Granted job allocation 1080
-[aorth@taurus: ~]$</code>+[aorth@compute05~]$</code> 
 + 
 +**NB:** interactive jobs have a time limit of 8 hours: if you need more, then you should write an sbatch script. 
 + 
 +You can also open an interactive session on a specific node of the cluster by specifying it through the ''-w'' commandline argument: 
 +<code>[jbaka@hpc ~]$ interactive -w compute03 
 +salloc: Granted job allocation 16349 
 +[jbaka@compute03 ~]$</code>
  
-**NB:** interactive jobs have a time limit of 8 hours, if you need more then you should write a batch script. 
 ==== Batch jobs ==== ==== Batch jobs ====
-Request 4 CPUs for a NCBI BLAST+ job in the ''batch'' partition.  Create a file //blast.sbatch//: +We are writing a SLURM script below. The parameters in its header request 4 CPUs for in the ''batch'' partition, and name our job "blastn"This name is only used internally by SLURM for reporting purposes. So let's go ahead and ceate a file //blast.sbatch//: 
-<code>#!/usr/bin/env bash+<code>#!/usr/bin/bash -l
 #SBATCH -p batch #SBATCH -p batch
 #SBATCH -J blastn #SBATCH -J blastn
Line 33: Line 50:
 blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4</code> blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4</code>
  
-Submit the script with ''sbatch'':+In the above, please **DO NOT FORGET the '-l' option** on the first ("sha-bang") line, as it is compulsory for correct interpretation of the ''module load'' commands. 
 + 
 +We then submit the script with the ''sbatch'' command:
 <code>$ sbatch blast.sbatch  <code>$ sbatch blast.sbatch 
 Submitted batch job 1082</code> Submitted batch job 1082</code>
Line 42: Line 61:
 Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example: Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example:
  
-<code>#!/bin/env bash+<code>#!/usr/bin/bash -l
 #SBATCH -p batch #SBATCH -p batch
-#SBATCH -n 4 
 #SBATCH -J blastn #SBATCH -J blastn
 +#SBATCH -n 4
  
 # load the blast module # load the blast module
Line 62: Line 81:
 blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4 -out blast.out</code> blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4 -out blast.out</code>
  
-All output is directed to ''$WORKDIR/'', which is the temporary folder on the compute node. See these slides from [[http://alanorth.github.io/hpc-users-group3/#/2|HPC Users Group #3]] for more info.+All output is directed to ''$WORKDIR/'', which is the temporary folder on the compute node. See these slides from [[https://alanorth.github.io/hpc-users-group3/#/2|HPC Users Group #3]] for more info.
  
 ==== Check queue status ==== ==== Check queue status ====
-<code>squeue</code> +''squeue'' is the command to use in order to get information about the different jobs that are running on the cluster, waiting in a queue for resources to become available, or halted for some reason: 
- +<code>$ squeue  
-==== Receive mail notifications ==== +  JOBID PARTITION               NAME     USER ST       TIME   CPUS NODELIST(REASON) 
-To receive mail notifications about the state of your job, add the following lines to your sbatch scriptwhereby <EMAIL_ADDRESS> is your email address<code> + 746596     batch          structure    aorth  R 5-15:27:10      1 compute06 
-#SBATCH --mail-user <EMAIL_ADDRESS> + 746597     batch          structure    aorth  R 5-13:49:37      1 compute06 
-#SBATCH --mail-type ALL</code> + 746885     batch    model-selection    jjuma  R 4-20:45:15      8 compute06 
- + 746998     batch        interactive  afeleke  R      30:09      1 compute06 
-Notification mail types(--mail-type) can be BEGIN, END, FAIL, REQUEUE and ALL(any state change). + 746999     batch             blastp    aorth  R       7:20      6 compute05 
- +</code>
-Example+
-<code> +
-#SBATCH --mail-user J.Doe@cgiar.org +
-#SBATCH --mail-type ALL</code> +
using-slurm.1496816929.txt.gz · Last modified: 2017/06/07 06:28 by aorth