Differences

This shows you the differences between two versions of the page.

--- mkatari-bioinformatics-august-2013-blastnotes [2013/08/14 11:38] – [Run Blast using sbatch] mkatari
+++ mkatari-bioinformatics-august-2013-blastnotes [2015/06/04 12:38] (current) – mkatari
@@ Line 1: / Line 1: @@
-[[mkatari-bioinformatics-august-2013|Manny's Bioinformaics Workshop]]
+[[mkatari-bioinformatics-august-2013|Back to Manny's Bioinformatics Workshop Home]]
+This page contains a short introduction on how to create a Blast database and also how to run Blast using a query. Please replace your sequences with the names provided for the scripts to work for you.
+**Please remember** --- if you are planning to run Blast on command line, please do it in **interactive** mode. The recommended approach is to use sbatch scripts (described in more detail further down the page).
+====== Getting specific fasta sequences from reference file ======
+In case you need to retrieve a specific sequence from a larger fasta file, use one of my perl scripts. Simply provide a pattern that it needs to match in the definition field and it will retrieve the sequence. The **-f** is the reference file, **-o** is the output file name, and **-query** is the pattern it will try to match.
+<code>
+perl /home/mkatari/PerlScripts/getFastaName.pl \
+    -f /home/mkatari/blast/Mesculenta_147_v4 \
+    -o scaffold12498.fa \
+    -query scaffold12498
+</code>
+====== Creating the blast database ======
+Before we can actually perform the blast, we need to prepare the database using **makeblastdb**. The input (**-in**)for makeblastdb is a fasta file with the reference sequence, in this case the the cassava genomes which includes chromosomes and scaffolds. You have to define the database type (**-dbtype**) to be either "nucl" or "prot" and if your file does not have the correct extension, simply tell it the the file contains fasta sequences ( **-input_type** "fasta" )
+<code>
+makeblastdb -in cassavaV5 -input_type "fasta" -dbtype "nucl"
+</code>
+====== Running Blast ======
+Now to run blast we simply have to specific which blast we want to use and provide the respective arguments. For this example we are looking to see where the scaffold from an older version of the assembly is present in the new assembly. So we are aligning a nucleotide query to a nucleotide database ( **blastn** ). There are incredible number of options for blastn. To get a detailed description of all the different option type ( **blastn -help** ).
+Some of the options I find useful are:
+-query = the name of the query sequence
+-db = the name of the database
+-out = the name of the output file
+-outfmt = in which format to save the file. The default is the traditional output that shows alignments, but I also use value outfmt 6, which will save in tabular format.
+<code>
+blastn -query scaffold12498.fa \
+-db cassavaV5 \
+-outfmt 6 \
+-out scaffold12498.cassavaV5.bout
+</code>
 ====== Run Blast using sbatch ======
@@ Line 18: / Line 58: @@
 <code>
 module load blast
-which blastx # note this will only work if you already have the blast module loaded.
+which blastn # note this will only work if you already have the blast module loaded.
-/export/apps/blast/2.2.28+/bin/blastx
+/export/apps/blast/2.2.28+/bin/blastn
 </code>
@@ Line 36: / Line 76: @@
 #SBATCH -n 8
-/export/apps/blast/2.2.28+/bin/blastx -db /export/data/bio/ncbi/blast/db/nr -query /home/mkatari/ndl06-132-velvet31/contigs.fa -out /home/mkatari/ndl06-132-velvet31/contigs.fa.nr -num_threads 8 -outfmt 6 -evalue 0.00001
+module load blast
+echo "Blast ready to run"
+blastn -db cassavaV5 -query scaffold12498.fa \
+       -out scaffold12498.cassavaV5_2.bout \
+       --num_threads 8 -outfmt 6 -evalue 0.00001
+echo "Blast complete"
 </code>
   * Run the sbatch file. As soon as you run the file a job id will be assigned to your submission.
@@ Line 84: / Line 134: @@
 echo $OUTPUT
-/export/apps/blast/2.2.28+/bin/blastx -db /export/data/bio/ncbi/blast/db/nr -query $INPUT -out $OUTPUT  -num_threads 8 -outfmt 6 -evalue 0.00001
+/export/apps/blast/2.2.28+/bin/blastx \
+    -db /export/data/bio/ncbi/blast/db/nr \
+    -query $INPUT \
+    -out $OUTPUT  \
+    -num_threads 8 \
+    -outfmt 6 \
+    -evalue 0.00001
+# the different columns in this output format are :
+# Fields: query id,
+          subject id,
+          % identity,
+          alignment length,
+          mismatches,
+          gap opens,
+          query start,
+          query end,
+          subject start,
+          subject end,
+          evalue,
+          bit score
 </code>
@@ Line 96: / Line 166: @@
 sbatch /home/mkatari/blast.sbatch /home/mkatari/ndl06-132-velvet31/contigs.fa
 </code>