mpiblast
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
mpiblast [2010/01/18 10:14] – 172.26.0.166 | mpiblast [2010/01/29 09:35] – 172.26.0.166 | ||
---|---|---|---|
Line 2: | Line 2: | ||
Parallel implementation of NCBI's BLAST algorithm. | Parallel implementation of NCBI's BLAST algorithm. | ||
- | http:// | + | * http:// |
+ | * OpenMPI FAQ: http:// | ||
< | < | ||
- | * **--nfrags=10** Specifies | + | * **nfrags** |
- | http:// | ||
- | ===== Updating BLAST Databases ===== | ||
- | http:// | ||
===== Notes on .ncbirc ==== | ===== Notes on .ncbirc ==== | ||
Notes on setting up the '' | Notes on setting up the '' | ||
Line 29: | Line 27: | ||
The Data variable gives the location of the NCBI Data directory containing BLOSUM and PAM scoring matrices, among other things. The scoring matrix files are necessary for any type of protein BLAST search and should be accessible by all cluster nodes. The BLASTMAT variable also specifies the path to the scoring matrices, and will usually be identical to the Data variable. The BLASTDB variable tells standard NCBI blastall (not mpiBLAST) where to find BLAST databases. As previously mentioned, the Shared and Local variables give the shared and local database paths, respectively. By setting BLASTDB to the same path as Shared, it is possible for NCBI blastall to share the same databases that mpiBLAST uses. In such a configuration, | The Data variable gives the location of the NCBI Data directory containing BLOSUM and PAM scoring matrices, among other things. The scoring matrix files are necessary for any type of protein BLAST search and should be accessible by all cluster nodes. The BLASTMAT variable also specifies the path to the scoring matrices, and will usually be identical to the Data variable. The BLASTDB variable tells standard NCBI blastall (not mpiBLAST) where to find BLAST databases. As previously mentioned, the Shared and Local variables give the shared and local database paths, respectively. By setting BLASTDB to the same path as Shared, it is possible for NCBI blastall to share the same databases that mpiBLAST uses. In such a configuration, | ||
- | |||
- | ===== wwwblast ===== | ||
- | http:// | ||
===== Frequently Asked Questions ===== | ===== Frequently Asked Questions ===== | ||
Collection of the more-helpful questions and answers from the [[http:// | Collection of the more-helpful questions and answers from the [[http:// | ||
- | |||
====How do I format a huge database? | ====How do I format a huge database? | ||
Large databases like nt can consume several gigabytes of disk space and it is preferable to store them in compressed form. Starting with mpiBLAST 1.4.0 it is possible to pipe FastA formatted sequence data into mpiformatdb. This feature provides the ability to directly format a compressed (gzip/bzip etc.) database using command line syntax like: | Large databases like nt can consume several gigabytes of disk space and it is preferable to store them in compressed form. Starting with mpiBLAST 1.4.0 it is possible to pipe FastA formatted sequence data into mpiformatdb. This feature provides the ability to directly format a compressed (gzip/bzip etc.) database using command line syntax like: | ||
< | < | ||
+ | |||
+ | ==== SGE Support ==== | ||
+ | See this FAQ entry: http:// | ||
+ | |||
+ | < | ||
+ | MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3.2)</ | ||
+ | ===== Benchmarks ===== | ||
+ | < | ||
+ | | ||
+ | real 7m48.052s | ||
+ | user 7m40.775s | ||
+ | sys | ||
+ | |||
+ | < | ||
+ | Total Execution Time: 395.754 | ||
+ | |||
+ | real 6m36.841s | ||
+ | user 12m13.891s | ||
+ | sys | ||
+ | |||
+ | With 12 jobs, sge, mpiblast, 6 nodes did it in: | ||
+ | < | ||
+ | Total Execution Time: 98.3068</ | ||
+ | |||
+ | < | ||
+ | |||
+ | real 3m6.163s | ||
+ | user 0m0.046s | ||
+ | sys | ||
+ | |||
+ | |||
+ | The number of processes for an MPI job should be +1 of the number of CPUs because one process is used as the master to control the other jobs. | ||
+ | ===== Random Notes ===== | ||
+ | |||
+ | ==== Number of Jobs ==== | ||
+ | |||
+ | https:// | ||
+ | |||
+ | %%With formatdb, the number of nodes refers to compute node number. | ||
+ | With mpiBLAST, np refers to number of processes, which isn' | ||
+ | necessarily linked to compute node or processor number. You can run | ||
+ | 10 processes on 4 processors. But it's recommended to run a single | ||
+ | process per processor. But the minimum number of processes for | ||
+ | mpiBLAST is 3, no matter what your compute node number is.%% | ||
+ | ==== Number of Fragments ==== | ||
+ | Rule of thumb for large databases, one segment for every gigabyte (144 GB, 144 segments). | ||
+ | |||
+ | * http:// | ||
+ | * https:// | ||
+ | |||
+ | ==== Incorrect mpiBLAST Version ==== | ||
+ | You're not crazy, it's a known issue: http:// | ||
+ | < | ||
+ | mpiblast-1.5.0-pio | ||
+ | $ mpiblast --version | ||
+ | mpiblast version 1.4.0 | ||
+ | </ | ||
+ | ===== Links ===== | ||
+ | * Submitting MPI jobs using SGE: http:// | ||
+ | * mpiBLAST Guide: http:// | ||
+ | * Updating the BLAST databases: http:// | ||
+ | * Rocks documentation on mpiBLAST: http:// | ||
+ | * wwwblast: http:// |