mkatari-bioinformatics-august-2013-blastnotes
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionNext revisionBoth sides next revision | ||
mkatari-bioinformatics-august-2013-blastnotes [2013/08/13 15:18] – created mkatari | mkatari-bioinformatics-august-2013-blastnotes [2013/08/14 11:38] – [Run Blast using sbatch] mkatari | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ########################## | + | [[mkatari-bioinformatics-august-2013|Manny' |
- | # Run Blast using sbatch # | + | |
- | ########################## | + | |
- | 1) create a new sbatch | + | ====== Run Blast using sbatch |
- | '' | + | * Create a new sbatch file (call it '' |
+ | |||
+ | < | ||
#!/bin/bash | #!/bin/bash | ||
#SBATCH -p highmem | #SBATCH -p highmem | ||
#SBATCH -n 8 | #SBATCH -n 8 | ||
- | '' | + | </ |
- | Normally any other # used in a bash script file means a comment follows. However since we are executing the sbatch file using sbatch command (see below) SBATCH knows to look for #SBATCH and use the information that follows. Here we are telling the SBATCH file to use the highmem partition (the mammoth server) and use 8 CPUs to perform the calculation. It is also important to tell that program you are running that it has 8 CPUs available so it will use them, else you will be reserving 8 CPUs but only using one. | + | Normally any other '' |
- | 2) There are several different job managers and different ways of setting up HPC computer systems. In practice I prefer not to assume that a job that is submitted to a different server will know how to find the different commands or even files. So I like to include the full paths for both. In order to find where the blastx command is located I simply type: | + | * There are several different job managers and different ways of setting up HPC computer systems. In practice I prefer not to assume that a job that is submitted to a different server will know how to find the different commands or even files. So I like to include the full paths for both. In order to find where the '' |
- | ---------------------------------------------- | + | < |
module load blast | module load blast | ||
which blastx # note this will only work if you already have the blast module loaded. | which blastx # note this will only work if you already have the blast module loaded. | ||
/ | / | ||
- | ---------------------------------------------- | + | </ |
- | From Alan's Blast sbatch script example on the wiki, I also know where the nr database is | + | From Alan's Blast sbatch script example on [[using-slurm|How to Use Slurm]], I also know where the nr database is |
- | ----------------------------------------------- | + | < |
/ | / | ||
- | ----------------------------------------------- | + | </ |
- | 3) Edit the contigs.fa.nr.sbatch file to include all changes. Final script looks like this: | + | * Edit the '' |
- | ---------------------------- | + | < |
#!/bin/bash | #!/bin/bash | ||
Line 37: | Line 37: | ||
/ | / | ||
- | ------------------------------ | + | </ |
- | 4) Run the sbatch file. As soon as you run the file a job id will be assigned to your submission. | + | * Run the sbatch file. As soon as you run the file a job id will be assigned to your submission. |
- | ----------------------- | + | < |
sbatch blast.sbatch | sbatch blast.sbatch | ||
- | ----------------------- | + | </ |
You can check the status of all jobs on the cluster by typing: | You can check the status of all jobs on the cluster by typing: | ||
- | ----------------------- | + | < |
squeue | squeue | ||
- | ----------------------- | + | </ |
You can check the details of your specific job by typing: | You can check the details of your specific job by typing: | ||
- | ---------------------------------------------- | + | < |
scontrol show job <your jobid> | scontrol show job <your jobid> | ||
- | ---------------------------------------------- | + | </ |
You can cancel your job by running | You can cancel your job by running | ||
- | ---------------------------------------------- | + | < |
scancel <your jobid> | scancel <your jobid> | ||
- | ---------------------------------------------- | + | </ |
The standard output of your job is redirected to a file called | The standard output of your job is redirected to a file called | ||
- | ---------------------------------------------- | + | < |
slurm-< | slurm-< | ||
- | ---------------------------------------------- | + | </ |
+ | |||
+ | * Now imagine that you have to repeat this exact blast for many different sequences but you do not necessarily want to have to create a new batch file or keep editing the same one. The path to the input and output files in our current sbatch files are "hard coded" | ||
+ | * Lucky for us we can define variables in a bash script and provide the value of the variable from the command line. A modified version of the sbatch script is provided below. | ||
+ | |||
+ | < | ||
+ | # | ||
+ | |||
+ | # | ||
+ | # | ||
+ | |||
+ | INPUT=$1 | ||
+ | OUTPUT=" | ||
+ | |||
+ | echo $INPUT | ||
+ | echo $OUTPUT | ||
+ | |||
+ | / | ||
+ | </ | ||
+ | |||
+ | Arguments on a command line are interpreted by the bash script in sequence. The values automatically inherit the variable $1, $2, $3 ... as they are read from command line. It is a good idea to reassign these with variables that have names that make sense to us. Any string of characters (without spaces) provided after the script name will be assigned as $1 and then the variable INPUT will be assigned this value. In the script above we also see how to create a new variable OUTPUT which contains the same information as INPUT but now also contains a " | ||
+ | |||
+ | Now to refer to the value saved in the variables we simply put $ infront as shown in the blast command line. | ||
+ | |||
+ | To execute this sbatch file you would simply provide the name of the input file as shown below. | ||
+ | |||
+ | < | ||
+ | sbatch / | ||
+ | </ | ||
mkatari-bioinformatics-august-2013-blastnotes.txt · Last modified: 2015/06/04 12:38 by mkatari