User Tools

Site Tools


mkatari-bioinformatics-august-2013-blastnotes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
mkatari-bioinformatics-august-2013-blastnotes [2013/08/13 15:29] mkatarimkatari-bioinformatics-august-2013-blastnotes [2014/07/02 13:58] mkatari
Line 1: Line 1:
 +[[mkatari-bioinformatics-august-2013|Back to Manny's Bioinformatics Workshop Home]]
 +
 +====== Getting specific fasta sequences from reference file ======
 +
 +In case you need to retrieve a specific sequence from a larger fasta file, use one of my perl scripts. Simply provide a pattern that it needs to match in the definition field and it will retrieve the sequence. The **-f** is the reference file, **-o** is the output file name, and **-query** is the pattern it will try to match.
 +
 +<code>
 +perl /home/mkatari/PerlScripts/getFastaName.pl -f ../blast/Mesculenta_147_v4 \
 +    -o scaffold12498.fa \
 +    -query scaffold12498
 +</code>
 +
 +====== Creating the blast database ======
 +
 +Before we can actually perform the blast, we need to prepare the database using **makeblastdb**. 
 +
 +<code>
 +makeblastdb -in cassavaV5 -input_type "fasta" -dbtype "nucl"
 +</code>
 +
 ====== Run Blast using sbatch ====== ====== Run Blast using sbatch ======
  
Line 67: Line 87:
 </code> </code>
  
 +  * Now imagine that you have to repeat this exact blast for many different sequences but you do not necessarily want to have to create a new batch file or keep editing the same one. The path to the input and output files in our current sbatch files are "hard coded". What we want is the ability to change the input and output name without having to edit the file. 
 +  * Lucky for us we can define variables in a bash script and provide the value of the variable from the command line. A modified version of the sbatch script is provided below.
 +
 +<code>
 +#!/bin/bash
 +
 +#SBATCH -p batch
 +#SBATCH -n 8
 +
 +INPUT=$1
 +OUTPUT="$1".output
 +
 +echo $INPUT
 +echo $OUTPUT
 +
 +/export/apps/blast/2.2.28+/bin/blastx -db /export/data/bio/ncbi/blast/db/nr -query $INPUT -out $OUTPUT  -num_threads 8 -outfmt 6 -evalue 0.00001
 +</code>
 +
 +Arguments on a command line are interpreted by the bash script in sequence. The values automatically inherit the variable $1, $2, $3 ... as they are read from command line. It is a good idea to reassign these with variables that have names that make sense to us. Any string of characters (without spaces) provided after the script name will be assigned as $1 and then the variable INPUT will be assigned this value. In the script above we also see how to create a new variable OUTPUT which contains the same information as INPUT but now also contains a ".output"
 +
 +Now to refer to the value saved in the variables we simply put $ infront as shown in the blast command line.
 +
 +To execute this sbatch file you would simply provide the name of the input file as shown below.
 +
 +<code>
 +sbatch /home/mkatari/blast.sbatch /home/mkatari/ndl06-132-velvet31/contigs.fa
 +</code>
mkatari-bioinformatics-august-2013-blastnotes.txt · Last modified: 2015/06/04 12:38 by mkatari