User Tools

Site Tools


mkatari-bioinformatics-august-2013-blastnotes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
mkatari-bioinformatics-august-2013-blastnotes [2014/07/02 13:58] mkatarimkatari-bioinformatics-august-2013-blastnotes [2014/07/11 09:04] mkatari
Line 6: Line 6:
  
 <code> <code>
-perl /home/mkatari/PerlScripts/getFastaName.pl -f ../blast/Mesculenta_147_v4 \+perl /home/mkatari/PerlScripts/getFastaName.pl 
 +    -f /home/mkatari/blast/Mesculenta_147_v4 \
     -o scaffold12498.fa \     -o scaffold12498.fa \
     -query scaffold12498     -query scaffold12498
Line 13: Line 14:
 ====== Creating the blast database ====== ====== Creating the blast database ======
  
-Before we can actually perform the blast, we need to prepare the database using **makeblastdb**. +Before we can actually perform the blast, we need to prepare the database using **makeblastdb**. The input (**-in**)for makeblastdb is a fasta file with the reference sequence, in this case the the cassava genomes which includes chromosomes and scaffolds. You have to define the database type (**-dbtype**) to be either "nucl" or "prot" and if your file does not have the correct extension, simply tell it the the file contains fasta sequences ( **-input_type** "fasta" )
  
 <code> <code>
 makeblastdb -in cassavaV5 -input_type "fasta" -dbtype "nucl" makeblastdb -in cassavaV5 -input_type "fasta" -dbtype "nucl"
 +</code>
 +
 +====== Running Blast ======
 +
 +Now to run blast we simply have to specific which blast we want to use and provide the respective arguments. For this example we are looking to see where the scaffold from an older version of the assembly is present in the new assembly. So we are aligning a nucleotide query to a nucleotide database ( **blastn** ). There are incredible number of options for blastn. To get a detailed description of all the different option type ( **blastn -help** ).
 +
 +Some of the options I find useful are:
 +-query = the name of the query sequence
 +-db = the name of the database
 +-out = the name of the output file
 +-outfmt = in which format to save the file. The default is the traditional output that shows alignments, but I also use value outfmt 6, which will save in tabular format.
 +
 +<code>
 +blastn -query scaffold12498.fa \
 +-db cassavaV5 \
 +-outfmt 6 \
 +-out scaffold12498.cassavaV5.bout
 </code> </code>
  
Line 102: Line 120:
 echo $OUTPUT echo $OUTPUT
  
-/export/apps/blast/2.2.28+/bin/blastx -db /export/data/bio/ncbi/blast/db/nr -query $INPUT -out $OUTPUT  -num_threads 8 -outfmt 6 -evalue 0.00001+/export/apps/blast/2.2.28+/bin/blastx 
 +    -db /export/data/bio/ncbi/blast/db/nr 
 +    -query $INPUT 
 +    -out $OUTPUT  
 +    -num_threads 8 
 +    -outfmt 6 
 +    -evalue 0.00001 
 + 
 +# the different columns in this output format are : 
 +# Fields: query id,  
 +          subject id,  
 +          % identity,  
 +          alignment length,  
 +          mismatches,  
 +          gap opens,  
 +          query start,  
 +          query end,  
 +          subject start,  
 +          subject end,  
 +          evalue,  
 +          bit score
 </code> </code>
  
mkatari-bioinformatics-august-2013-blastnotes.txt · Last modified: 2015/06/04 12:38 by mkatari