User Tools

Site Tools


using-slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
using-slurm [2019/02/01 12:32] – [Using SLURM] jean-bakausing-slurm [2022/11/03 11:38] (current) jean-baka
Line 12: Line 12:
 To see more information about the queue configuration, use ''sinfo -lNe''. To see more information about the queue configuration, use ''sinfo -lNe''.
  
-<code>[jbaka@compute03 ~]$ sinfo -lNe +<code>$ sinfo -lNe 
-Fri Feb  1 15:27:44 2019+Thu Aug 04 15:08:48 2022
 NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON               NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
-compute2       1     batch        idle   64   64:1:1                 10   (null) none                 +compute03      1   highmem       idle        2:4:1 322249            10   (null) none 
-compute03          batch       mixed       8:1:1                    (null) none                 +compute05          batch       mixed 48     2:24:1 386500            10   (null) none 
-compute03        highmem       mixed       8:1:1                  5   (null) none                 +compute06          batch       mixed 64     2:32:1 257491             5   (null) none 
-compute04          batch       mixed    8    8:1:1      1             5   (null) none                 +compute07        highmem        idle 8       1:8:1 101956             5   (null) none 
-hpc            1    debug*        idle       4:1:1      1        0      1   (null) none                 +hpc            1    debug*        idle 4       1:4:1 128876             1   (null) none</code>
-mammoth        1   highmem        idle    8    8:1:1      1            30   (null) none                 +
-taurus             batch       mixed   64   64:1:                20   (null) none        +
-</code>+
  
-The above tells you, for instance, that compute04 has 8 CPUs while compute2 has 64 CPUs. And that a job sent to the "highmem" partition (a SLURM verb equivalent to "queue", as per the vocabulary in use with other schedulers, e.g. Sun Grid Engine), then it will end up being run on either compute03 or mammoth+The above tells you, for instance, that compute06 has 64 CPUs. And that a job sent to the "highmem" partition (a SLURM verb equivalent to "queue", as per the vocabulary in use with other schedulers, e.g. Sun Grid Engine), then it will end up being run on either compute03 or compute07
  
 ===== Submitting jobs ===== ===== Submitting jobs =====
 ==== Interactive jobs ==== ==== Interactive jobs ====
-How to get an interactive session, ie when you want to interact with a program (like R, etc):+How to get an interactive session, i.e. when you want to interact with a program (like R, etc) for a limited amount of time, making the scheduler aware that you are requesting/using resources on the cluster:
 <code>[aorth@hpc: ~]$ interactive  <code>[aorth@hpc: ~]$ interactive 
 salloc: Granted job allocation 1080 salloc: Granted job allocation 1080
-[aorth@taurus: ~]$</code>+[aorth@compute05: ~]$</code>
  
-**NB:** interactive jobs have a time limit of 8 hours: if you need more, then you should write a batch script.+**NB:** interactive jobs have a time limit of 8 hours: if you need more, then you should write an sbatch script.
  
 You can also open an interactive session on a specific node of the cluster by specifying it through the ''-w'' commandline argument: You can also open an interactive session on a specific node of the cluster by specifying it through the ''-w'' commandline argument:
Line 41: Line 38:
  
 ==== Batch jobs ==== ==== Batch jobs ====
-Request 4 CPUs for a NCBI BLAST+ job in the ''batch'' partition.  Create a file //blast.sbatch//: +We are writing a SLURM script below. The parameters in its header request 4 CPUs for in the ''batch'' partition, and name our job "blastn"This name is only used internally by SLURM for reporting purposes. So let's go ahead and ceate a file //blast.sbatch//: 
-<code>#!/usr/bin/env bash+<code>#!/usr/bin/bash -l
 #SBATCH -p batch #SBATCH -p batch
 #SBATCH -J blastn #SBATCH -J blastn
Line 53: Line 50:
 blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4</code> blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4</code>
  
-Submit the script with ''sbatch'':+In the above, please **DO NOT FORGET the '-l' option** on the first ("sha-bang") line, as it is compulsory for correct interpretation of the ''module load'' commands. 
 + 
 +We then submit the script with the ''sbatch'' command:
 <code>$ sbatch blast.sbatch  <code>$ sbatch blast.sbatch 
 Submitted batch job 1082</code> Submitted batch job 1082</code>
Line 62: Line 61:
 Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example: Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example:
  
-<code>#!/usr/bin/env bash+<code>#!/usr/bin/bash -l
 #SBATCH -p batch #SBATCH -p batch
 +#SBATCH -J blastn
 #SBATCH -n 4 #SBATCH -n 4
-#SBATCH -J blastn 
  
 # load the blast module # load the blast module
Line 85: Line 84:
  
 ==== Check queue status ==== ==== Check queue status ====
-''squeue'' is the command to use to get more information about the different jobs that are running on the cluster, waiting in a queue for resources to become available, or halted for some reason: +''squeue'' is the command to use in order to get information about the different jobs that are running on the cluster, waiting in a queue for resources to become available, or halted for some reason: 
-<code>[jbaka@compute03 ~]$ squeue +<code>$ squeue  
-             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) +  JOBID PARTITION               NAME     USER ST       TIME   CPUS NODELIST(REASON) 
-             16330     batch interact  pyumbya     6:33:26      taurus + 746596     batch          structure    aorth  5-15:27:10      compute06 
-             16339     batch interact ckeambou     5:19:07      compute04 + 746597     batch          structure    aorth  R 5-13:49:37      compute06 
-             16340     batch interact ckeambou     5:12:52      1 compute04 + 746885     batch    model-selection    jjuma  4-20:45:15      8 compute06 
-             16346     batch velvet_o  dkiambi     1:39:09      1 compute04 + 746998     batch        interactive  afeleke       30:09      1 compute06 
-             16348     batch interact fkibegwa  R      22:38      1 taurus + 746999     batch             blastp    aorth        7:20      6 compute05
-             16349     batch interact    jbaka        3:27      1 compute03+
 </code> </code>
- 
-In addition to the information above, it is sometimes useful to know what is the number of CPUs (computing cores) allocated to each job: the scheduler will queue jobs asking for resources that aren't available, most often because the other jobs are eating up all the CPUs available on the host. To get the number of CPUs for each job and display the whole thing nicely, the command is slightly more involved: 
- 
-<code>[jbaka@compute03 ~]$ squeue -o"%.7i %.9P %.16j %.8u %.2t %.10M %.6D %10N %C" 
-  JOBID PARTITION             NAME     USER ST       TIME  NODES NODELIST   CPUS 
-  16330     batch      interactive  pyumbya  R    6:40:52      1 taurus     1 
-  16339     batch      interactive ckeambou  R    5:26:33      1 compute04  1 
-  16340     batch      interactive ckeambou  R    5:20:18      1 compute04  1 
-  16346     batch velvet_out_ra_10  dkiambi  R    1:46:35      1 compute04  2 
-  16348     batch      interactive fkibegwa  R      30:04      1 taurus     1 
-  16349     batch      interactive    jbaka  R      10:53      1 compute03  1 
-</code> 
- 
-or, alternatively: 
- 
-<code>[jbaka@compute03 ~]$ squeue -O username,jobid,name,nodelist,numcpus 
-USER                JOBID               NAME                NODELIST            CPUS                 
-pyumbya             16330               interactive         taurus              1                    
-ckeambou            16339               interactive         compute04                              
-ckeambou            16340               interactive         compute04                              
-dkiambi             16346               velvet_out_ra_109_vecompute04                              
-fkibegwa            16348               interactive         taurus              1                    
-jbaka               16349               interactive         compute03                    
-</code> 
- 
using-slurm.1549024334.txt.gz · Last modified: 2019/02/01 12:32 by jean-baka