User Tools

Site Tools


mkatari-bioinformatics-august-2013-more-slurm

Back to Manny's Bioinformatics Workshop Home

Some more useful SLURM notes

Reading Variable from command line

Building on Alan's notes on Using SLURM we can also provide an option to input the variables on the command line. Basically if you have ten different files you want to execute separately, you don't want to have to create 10 different files. One script should be able to do the job, but you simply must provide the input to the script.

In shell scripting, the first word following the name of your script is given the variable $1. Similarly the second word will be assigned $2. As a rule you must document to the user of the script what is expected in the first argument and also the following because in this case order matters.

In the script you can access the variables, modify them, and also create new ones. In the example below, the script is accepting fasta file as $INPUT and it is creating a new variable $OUTPUT to specify the where the output should be stored. Note then when variables are being assigned there is $ in front.

The echo statements print a message to the slurm report. This can be useful to make sure the job as ended successfully.

#!/bin/env bash                                                                 
#SBATCH -p batch                                                                
#SBATCH -J blastn                                                               
#SBATCH -n 4                                                                    

INPUT=$1
OUTPUT=$INPUT.output

module load blast/2.2.28+

echo "Ready to run Blast"

blastn -query $INPUT -db nt -out $OUTPUT -num_threads 4

echo "Blast done"

Generating sbatch scripts on the fly

In the case where you have have hundreds of files, it is still quite cumbersome to execute the same script manually. In the example below we will get a list of inputs we want to use as input and create a separate sbatch file for each of them.

The single quote (not the apostraphe, on the US keyboard it is located to the left of 1) can be used to capture command line results. Here we get a list of fasta files that start with the word test and store it in the variable $FILES. Notice that $FILES is not just one string, but an array of files returned as a result to the ls command. Then we start a loop and work with one file at a time. At each iteration of the for loop the file name will be stored in the variable $INPUT.

The code stays the same. Note that we will echo the entire sbatch script and redirect it into a new sbatch script. Also note that in order for the double quotes to be used as double quotes and not as closure to our echo's double quote we use the escape character \. The escape character tells the shell to interpret any special characters as regular characters.

The code below will create a new sbatch file for each fasta file and then at then submit the jobs. Very useful if you are working on hundred and thousands of files.

#!/bin/env bash                                                                                      
#SBATCH -p batch                                                                                     
#SBATCH -J blastn                                                                                    
#SBATCH -n 4                                                                                         

#results of the ls command is captured in the variable FILES                                         
FILES=`ls test*fa`

#loop through all files in FILES and each iteration, INPUT will have name of one file                
for INPUT in $FILES
do

#this line gets printed into screen                                                                  
echo "file name "$INPUT

#creating variables to store values                                                                  
SBATCH=$INPUT.blast.sbatch
OUTPUT=$INPUT.output

#the following echo is going to be saved in sbatch file to be executed later                         
echo "#!/bin/env bash                                                                                
#SBATCH -p batch                                                                                     
#SBATCH -J blastn                                                                                    
#SBATCH -n 4                                                                                      
                                                                                                     
module load blast/2.2.28+                                                                            
                                                                                                     
echo \"Ready to run Blast\"                                                                          
                                                                                                     
blastn -query $INPUT -db nt -out $OUTPUT -num_threads 4                                              
echo \"Blast Done\"                                                                                  
" > $SBATCH

#now that the file is done writing, execute the sbatch file                                          
sbatch $SBATCH

#end of the loop. code will be repeated (starting at "do") until all files in FILES is done.         
done
mkatari-bioinformatics-august-2013-more-slurm.txt · Last modified: 2014/06/09 08:19 by mkatari