User Tools

Site Tools


mpiblast

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
mpiblast [2010/01/18 09:21] 172.26.0.166mpiblast [2010/01/29 09:34] 172.26.0.166
Line 2: Line 2:
 Parallel implementation of NCBI's BLAST algorithm. Parallel implementation of NCBI's BLAST algorithm.
  
-http://wiki.bioinformatics.ucdavis.edu/index.php/MPI_Blast+  * http://wiki.bioinformatics.ucdavis.edu/index.php/MPI_Blast 
 +  * OpenMPI FAQ: http://www.open-mpi.org/faq/
  
-  * **--nfrags=10** Specifies how many database fragments you want to split the original database into. This should be equal to how many different nodes you want to run mpiblast on. +<code>$ mpiformatdb -i drosoph.nt -p F --nfrags=12</code> 
 + 
 +  * **nfrags** specifies how many database fragments you want to split the original database into. This should be equal to how many different nodes you want to run mpiblast on. 
  
-http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl 
  
-===== Updating BLAST Databases ===== 
-http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastdb.html 
 ===== Notes on .ncbirc ==== ===== Notes on .ncbirc ====
 Notes on setting up the ''~/.ncbirc'' file from the mpiBLAST installation page: http://www.mpiblast.org/Docs/Install#unix Notes on setting up the ''~/.ncbirc'' file from the mpiBLAST installation page: http://www.mpiblast.org/Docs/Install#unix
Line 34: Line 34:
 ===== Frequently Asked Questions ===== ===== Frequently Asked Questions =====
 Collection of the more-helpful questions and answers from the [[http://www.mpiblast.org/Docs/FAQ|mpiBLAST FAQ]]. Collection of the more-helpful questions and answers from the [[http://www.mpiblast.org/Docs/FAQ|mpiBLAST FAQ]].
- 
 ====How do I format a huge database?==== ====How do I format a huge database?====
  
 Large databases like nt can consume several gigabytes of disk space and it is preferable to store them in compressed form. Starting with mpiBLAST 1.4.0 it is possible to pipe FastA formatted sequence data into mpiformatdb. This feature provides the ability to directly format a compressed (gzip/bzip etc.) database using command line syntax like: Large databases like nt can consume several gigabytes of disk space and it is preferable to store them in compressed form. Starting with mpiBLAST 1.4.0 it is possible to pipe FastA formatted sequence data into mpiformatdb. This feature provides the ability to directly format a compressed (gzip/bzip etc.) database using command line syntax like:
 <code>$ zcat nt.gz | mpiformatdb -i stdin -N 100 -t nt -p F</code> <code>$ zcat nt.gz | mpiformatdb -i stdin -N 100 -t nt -p F</code>
 +
 +==== SGE Support ====
 +See this FAQ entry: http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
 +
 +<code>$ ompi_info | grep gridengine
 +                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3.2)</code>
 +===== Benchmarks =====
 +<code>$ time blastall -d drosoph.nt -p blastn -i drosoph.seq -o drosoph.result
 +        
 +real    7m48.052s
 +user    7m40.775s
 +sys     0m6.732s</code>
 +
 +<code>$ time /opt/openmpi/bin/mpirun -np 4 /opt/Bio/mpiblast/bin/mpiblast -d drosoph.nt -i drosoph.seq -p blastn -o mpi_drosoph_result.txt
 +Total Execution Time: 395.754
 +
 +real    6m36.841s
 +user    12m13.891s
 +sys     0m56.631s</code>
 +
 +With 12 jobs, sge, mpiblast, 6 nodes did it in:
 +<code>$ less mpiblast_sge.sh.o5515
 +Total Execution Time: 98.3068</code>
 +
 +<code>$ time pb blastall -d alan_drosoph -p blastn -i sequences/drosoph.seq -o drosoph.result
 +                                                                               
 +real    3m6.163s
 +user    0m0.046s
 +sys     0m1.423s</code>
 +
 +
 +The number of processes for an MPI job should be +1 of the number of CPUs because one process is used as the master to control the other jobs.
 +===== Random Notes =====
 +
 +==== Number of Jobs ====
 +
 +https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2005-July/012726.html
 +
 +%%With formatdb, the number of nodes refers to compute node number.  
 +With mpiBLAST, np refers to number of processes, which isn't  
 +necessarily linked to compute node or processor number. You can run  
 +10 processes on 4 processors. But it's recommended to run a single  
 +process per processor. But the minimum number of processes for  
 +mpiBLAST is 3, no matter what your compute node number is.%%
 +==== Number of Fragments ====
 +Rule of thumb for large databases, one segment for every gigabyte (144 GB, 144 segments).
 +
 +  * http://lists.mpiblast.org/pipermail/users_lists.mpiblast.org/2009-August/000988.html
 +  * https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2008-February/029231.html
 +
 +==== Incorrect mpiBLAST Version ====
 +You're not crazy, it's a known issue: http://lists.mpiblast.org/pipermail/users_lists.mpiblast.org/2009-February/000933.html
 +<code>$ rpmquery -a mpiblast    
 +mpiblast-1.5.0-pio
 +$ mpiblast --version
 +mpiblast version 1.4.0
 +</code>
 +
 +===== Links =====
 +  * Submitting MPI jobs using SGE: http://www.shef.ac.uk/wrgrid/documents/gridengine.html
 +  * mpiBLAST Guide: http://www.mpiblast.org/Docs/Guide
 +  * Updating the BLAST databases: http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl
 +  * Rocks documentation on mpiBLAST: http://www.rocksclusters.org/roll-documentation/bio/5.2/mpiblast_usage.html