Back to Manny's Bioinformatics Workshop Home
Some more useful SLURM notes
Reading Variable from command line
Building on Alan's notes on Using SLURM we can also provide an option to input the variables on the command line. Basically if you have ten different files you want to execute separately, you don't want to have to create 10 different files. One script should be able to do the job, but you simply must provide the input to the script.
In shell scripting, the first word following the name of your script is given the variable $1. Similarly the second word will be assigned $2. As a rule you must document to the user of the script what is expected in the first argument and also the following because in this case order matters.
In the script you can access the variables, modify them, and also create new ones. In the example below, the script is accepting fasta file as $INPUT and it is creating a new variable $OUTPUT to specify the where the output should be stored. Note then when variables are being assigned there is $ in front.
The echo statements print a message to the slurm report. This can be useful to make sure the job as ended successfully.
#!/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 INPUT=$1 OUTPUT=$INPUT.output module load blast/2.2.28+ echo "Ready to run Blast" blastn -query $INPUT -db nt -out $OUTPUT -num_threads 4 echo "Blast done"
Generating sbatch scripts on the fly
In the case where you have have hundreds of files, it is still quite cumbersome to execute the same script manually. In the example below we will get a list of inputs we want to use as input and create a separate sbatch file for each of them.
The single quote (not the apostraphe, on the US keyboard it is located to the left of 1) can be used to capture command line results. Here we get a list of fasta files that start with the word test and store it in the variable $FILES. Notice that $FILES is not just one string, but an array of files returned as a result to the ls command. Then we start a loop and work with one file at a time. At each iteration of the for loop the file name will be stored in the variable $INPUT.
The code stays the same. Note that we will echo the entire sbatch script and redirect it into a new sbatch script. Also note that in order for the double quotes to be used as double quotes and not as closure to our echo's double quote we use the escape character \. The escape character tells the shell to interpret any special characters as regular characters.
The code below will create a new sbatch file for each fasta file and then at then submit the jobs. Very useful if you are working on hundred and thousands of files.
#!/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 #results of the ls command is captured in the variable FILES FILES=`ls test*fa` #loop through all files in FILES and each iteration, INPUT will have name of one file for INPUT in $FILES do #this line gets printed into screen echo "file name "$INPUT #creating variables to store values SBATCH=$INPUT.blast.sbatch OUTPUT=$INPUT.output #the following echo is going to be saved in sbatch file to be executed later echo "#!/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 module load blast/2.2.28+ echo \"Ready to run Blast\" blastn -query $INPUT -db nt -out $OUTPUT -num_threads 4 echo \"Blast Done\" " > $SBATCH #now that the file is done writing, execute the sbatch file sbatch $SBATCH #end of the loop. code will be repeated (starting at "do") until all files in FILES is done. done