Differences

This shows you the differences between two versions of the page.

--- using-slurm [2019/02/01 12:17] – jean-baka
+++ using-slurm [2022/08/04 12:08] – aorth
@@ Line 8: / Line 8: @@
   * highmem
-"debug" is the default queue, which is useful for testing job parameters, program paths, etc. The run-time limit of the "debug" partition is 5 minutes, after which jobs are killed.
+"debug" is the default queue, which is useful for testing job parameters, program paths, etc. The run-time limit of the "debug" partition is 5 minutes, after which jobs are killed. The other partitions have no set time limit.
 To see more information about the queue configuration, use ''sinfo -lNe''.
+<code>[jbaka@hpc ~]$ sinfo -lNe
+Fri Feb  1 15:27:44 2019
+NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
+compute2       1     batch        idle   64   64:1:1      1        0     10   (null) none
+compute03      1     batch       mixed    8    8:1:1      1        0      5   (null) none
+compute03      1   highmem       mixed    8    8:1:1      1        0      5   (null) none
+compute04      1     batch       mixed    8    8:1:1      1        0      5   (null) none
+hpc            1    debug*        idle    4    4:1:1      1        0      1   (null) none
+mammoth        1   highmem        idle    8    8:1:1      1        0     30   (null) none
+taurus         1     batch       mixed   64   64:1:1      1        0     20   (null) none
+</code>
+The above tells you, for instance, that compute04 has 8 CPUs while compute2 has 64 CPUs. And that a job sent to the "highmem" partition (a SLURM verb equivalent to "queue", as per the vocabulary in use with other schedulers, e.g. Sun Grid Engine), then it will end up being run on either compute03 or mammoth.
 ===== Submitting jobs =====
 ==== Interactive jobs ====
-How to get an interactive session, ie when you want to interact with a program (like R, etc):
+How to get an interactive session, i.e. when you want to interact with a program (like R, etc) for a limited amount of time, making the scheduler aware that you are requesting/using resources on the cluster:
 <code>[aorth@hpc: ~]$ interactive
 salloc: Granted job allocation 1080
@@ Line 27: / Line 41: @@
 ==== Batch jobs ====
-Request 4 CPUs for a NCBI BLAST+ job in the ''batch'' partition.  Create a file //blast.sbatch//:
+We are writing a SLURM script below. The parameters in its header request 4 CPUs for in the ''batch'' partition, and name our job "blastn". This name is only used internally by SLURM for reporting purposes. So let's go ahead and ceate a file //blast.sbatch//:
-<code>#!/usr/bin/env bash
+<code>#!/usr/bin/bash -l # <--- DO NOT FORGET '-l', it enables the module command
 #SBATCH -p batch
 #SBATCH -J blastn
@@ Line 39: / Line 53: @@
 blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt -num_threads 4</code>
-Submit the script with ''sbatch'':
+We then submit the script with the ''sbatch'' command:
 <code>$ sbatch blast.sbatch
 Submitted batch job 1082</code>
@@ Line 48: / Line 62: @@
 Instead, you can use a local "scratch" folder on the compute nodes to alleviate this burden, for example:
-<code>#!/usr/bin/env bash
+<code>#!/usr/bin/bash -l # <--- DO NOT FORGET '-l', it enables the module command
 #SBATCH -p batch
-#SBATCH -n 4
 #SBATCH -J blastn
+#SBATCH -n 4
 # load the blast module
@@ Line 71: / Line 85: @@
 ==== Check queue status ====
-<code>squeue</code>
+''squeue'' is the command to use in order to get information about the different jobs that are running on the cluster, waiting in a queue for resources to become available, or halted for some reason:
+<code>$ squeue
+  JOBID PARTITION               NAME     USER ST       TIME   CPUS NODELIST(REASON)
+     batch          structure    aorth  R 5-15:27:10      1 compute06
+     batch          structure    aorth  R 5-13:49:37      1 compute06
+     batch    model-selection    jjuma  R 4-20:45:15      8 compute06
+     batch        interactive  afeleke  R      30:09      1 compute06
+     batch             blastp    aorth  R       7:20      6 compute05
+</code>