Differences

This shows you the differences between two versions of the page.

--- mkatari-bioinformatics-august-2013-blastnotes [2014/07/02 13:58] – mkatari
+++ mkatari-bioinformatics-august-2013-blastnotes [2014/07/11 09:04] – mkatari
@@ Line 6: / Line 6: @@
 <code>
-perl /home/mkatari/PerlScripts/getFastaName.pl -f ../blast/Mesculenta_147_v4 \
+perl /home/mkatari/PerlScripts/getFastaName.pl \
+    -f /home/mkatari/blast/Mesculenta_147_v4 \
     -o scaffold12498.fa \
     -query scaffold12498
@@ Line 13: / Line 14: @@
 ====== Creating the blast database ======
-Before we can actually perform the blast, we need to prepare the database using **makeblastdb**.
+Before we can actually perform the blast, we need to prepare the database using **makeblastdb**. The input (**-in**)for makeblastdb is a fasta file with the reference sequence, in this case the the cassava genomes which includes chromosomes and scaffolds. You have to define the database type (**-dbtype**) to be either "nucl" or "prot" and if your file does not have the correct extension, simply tell it the the file contains fasta sequences ( **-input_type** "fasta" )
 <code>
 makeblastdb -in cassavaV5 -input_type "fasta" -dbtype "nucl"
+</code>
+====== Running Blast ======
+Now to run blast we simply have to specific which blast we want to use and provide the respective arguments. For this example we are looking to see where the scaffold from an older version of the assembly is present in the new assembly. So we are aligning a nucleotide query to a nucleotide database ( **blastn** ). There are incredible number of options for blastn. To get a detailed description of all the different option type ( **blastn -help** ).
+Some of the options I find useful are:
+-query = the name of the query sequence
+-db = the name of the database
+-out = the name of the output file
+-outfmt = in which format to save the file. The default is the traditional output that shows alignments, but I also use value outfmt 6, which will save in tabular format.
+<code>
+blastn -query scaffold12498.fa \
+-db cassavaV5 \
+-outfmt 6 \
+-out scaffold12498.cassavaV5.bout
 </code>
@@ Line 102: / Line 120: @@
 echo $OUTPUT
-/export/apps/blast/2.2.28+/bin/blastx -db /export/data/bio/ncbi/blast/db/nr -query $INPUT -out $OUTPUT  -num_threads 8 -outfmt 6 -evalue 0.00001
+/export/apps/blast/2.2.28+/bin/blastx \
+    -db /export/data/bio/ncbi/blast/db/nr \
+    -query $INPUT \
+    -out $OUTPUT  \
+    -num_threads 8 \
+    -outfmt 6 \
+    -evalue 0.00001
+# the different columns in this output format are :
+# Fields: query id,
+          subject id,
+          % identity,
+          alignment length,
+          mismatches,
+          gap opens,
+          query start,
+          query end,
+          subject start,
+          subject end,
+          evalue,
+          bit score
 </code>