ppp
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ppp [2010/09/09 15:55] – evilliers | ppp [2019/05/09 09:50] (current) – removed aorth | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ===== Pathogen Profiling Pipeline ===== | ||
| - | |||
| - | The Pathogen Profiling Pipeline project aims to develop a metagenomics procedure independent of laboratory cultivation and a flexible bioinformatics pipeline for the rapid identification and analysis of pathogens in samples containing complex mixtures of host and microbial nucleic acids. Sequence reads derived from next generation high throughput DNA sequencing (Roche GS FLX pyrosequencing) technology are passed through customizable metagenomic analysis pipelines, which subsequently filter and report on the best taxonomic hits. | ||
| - | |||
| - | Generated raw sequences may be filtered to exclude host and normal flora, thereby facilitating pathogen searching within the large and cumbersome pyrosequencing data sets while retaining specificity. | ||
| - | available biological databases for their analysis resulting in potentially limitless configurations for data | ||
| - | analysis pipelines. | ||
| - | situations), | ||
| - | processor-intensive analyses. | ||
| - | |||
| - | |||
| - | The Pathogen Profiling Pipeline facilitates the analysis, reporting, and data management aspects of large-scale pathogen discovery projects aimed at quickly identifying candidate etiological agents in complex nucleic acid mixtures. This outcome has enhanced outbreak preparedness by enhancing capacity for early recognition and containment of pathogens. | ||
| - | |||
| - | Developed by: | ||
| - | |||
| - | Tom Matthews and Gary Van Domselaar | ||
| - | |||
| - | National Microbiology Laboratory | ||
| - | Public Health Agency of Canada | ||
| - | 820 Elgin St., Winnipeg, MB, Canada R3E 3R2 | ||
| - | |||
| - | '' | ||
| - | gary.vandomselaar@gmail.com'' | ||
| - | |||
| - | ===== Installation ===== | ||
| - | |||
| - | After a [[[upgrading_rocks|fresh installation]] of Rocks 5.2 on the HPC cluster. | ||
| - | |||
| - | From the README in ppp.tar.gz: | ||
| - | < | ||
| - | |||
| - | Compute cluster: | ||
| - | - BLAST | ||
| - | - BioPerl -- 1.5 or newer | ||
| - | - DRMAA compliant scheduler -- Sun Grid Engine suggested | ||
| - | Web server: | ||
| - | - Apache2 | ||
| - | - Mod-Perl | ||
| - | - BioPerl -- 1.5 or newer | ||
| - | - Graphviz</ | ||
| - | ==== On the head node ==== | ||
| - | The installation and configuration of the head node should have taken care of the Apache2, mod_perl, and BioPerl requirements. | ||
| - | |||
| - | We need '' | ||
| - | - Download '' | ||
| - | - '' | ||
| - | |||
| - | === Configuring perl modules === | ||
| - | |||
| - | PPP's web interface needs XML:: | ||
| - | < | ||
| - | |||
| - | === PPP's DRMAA scheduler === | ||
| - | [[http:// | ||
| - | |||
| - | - Download Schedule/ | ||
| - | - Read the README :) | ||
| - | - Prepare the environment for compiling the perl module: | ||
| - | < | ||
| - | $ export LD_LIBRARY_PATH=$SGE_ROOT/ | ||
| - | $ ln -s $SGE_ROOT/ | ||
| - | Build and install the perl module: | ||
| - | < | ||
| - | make | ||
| - | make test | ||
| - | sudo make install</ | ||
| - | |||
| - | === Install PPP === | ||
| - | PPP's perl scripts need to be accessible to all nodes, so change directory to somewhere accessible to the nodes: | ||
| - | < | ||
| - | Unzip PPP: | ||
| - | < | ||
| - | Rename so its less confusing: | ||
| - | < | ||
| - | Now read the readme and install as per install instructions in INSTALL.PDF... In a nutshell: | ||
| - | < | ||
| - | # mkdir db scratch data</ | ||
| - | Edit the config file ('' | ||
| - | < | ||
| - | / | ||
| - | |||
| - | # | ||
| - | / | ||
| - | |||
| - | # | ||
| - | / | ||
| - | |||
| - | #rootPath | ||
| - | / | ||
| - | |||
| - | Run '' | ||
| - | < | ||
| - | |||
| - | Download and extract taxonomy databases from NCBI: | ||
| - | < | ||
| - | $ cd taxon | ||
| - | $ tar zxf taxdump.tar.gz</ | ||
| - | |||
| - | From the taxon folder, format the taxonomy databases using the '' | ||
| - | < | ||
| - | |||
| - | use Bio:: | ||
| - | use Bio:: | ||
| - | use FindBin; | ||
| - | use strict; | ||
| - | |||
| - | my $taxondir = $FindBin:: | ||
| - | |||
| - | if(!-d $taxondir) | ||
| - | { | ||
| - | die " | ||
| - | } | ||
| - | |||
| - | if(!-e " | ||
| - | { | ||
| - | print "It doesn' | ||
| - | } | ||
| - | |||
| - | my $db = new Bio:: | ||
| - | -directory => $taxondir, | ||
| - | -nodesfile => " | ||
| - | -namesfile => " | ||
| - | |||
| - | if(-e " | ||
| - | { | ||
| - | print " | ||
| - | } | ||
| - | else | ||
| - | { | ||
| - | print "There may be an error. | ||
| - | }</ | ||
| - | |||
| - | Copy the '' | ||
| - | < | ||
| - | Create a link to the ppp-backend directory in ppp-web: | ||
| - | < | ||
| - | Edit Apache' | ||
| - | < | ||
| - | < | ||
| - | AllowOverride None | ||
| - | Order allow,deny | ||
| - | allow from all | ||
| - | AddHandler perl-script cgi-script .cgi .pl | ||
| - | Options None | ||
| - | </ | ||
| - | |||
| - | < | ||
| - | AllowOverride None | ||
| - | Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch | ||
| - | Order allow,deny | ||
| - | Allow from all | ||
| - | SetHandler perl-script | ||
| - | PerlResponseHandler ModPerl:: | ||
| - | </ | ||
| - | </ | ||
| - | Restart Apache: | ||
| - | < | ||
| - | Change the permissions on everything so that Apache' | ||
| - | < | ||
| - | # chmod -R g+w *</ | ||
| - | |||
| - | Test! http:// | ||
| - | |||
| - | If you get errors, check the Apache error_log. :) | ||
| - | |||
| - | If it worked, go ahead and start the job manager: | ||
| - | < | ||
| - | # perl drmaamanager.pl | ||
| - | > Job manager initilized...</ | ||
| - | Now PPP's web interface should indicate that there is a job server running (green circle!) | ||
| - | |||
| - | To start the job manager and send it to the background: | ||
| - | < | ||
| - | # nohup perl drmaamanager.pl &</ | ||
| - | '' | ||
| - | |||
| - | |||
| - | ---- | ||
| - | |||
| - | === Administering PPP === | ||
| - | |||
| - | |||
| - | === Manually adding databases === | ||
| - | |||
| - | |||
| - | * Add the fasta files to your ' | ||
| - | |||
| - | * Format the database with the formatdb utility included with BLAST. The command | ||
| - | * " | ||
| - | |||
| - | example: '' | ||
| - | |||
| - | * If you would like a BioPerl index, you can also make it manually. Running " | ||
| - | * with no arguments will provide you with a perldoc page for the script, but again here' | ||
| - | * an example: | ||
| - | |||
| - | '' | ||
| - | |||
| - | === Adding Input Files === | ||
| - | |||
| - | From the Administration page, click the Upload Files button. From here adding input files is | ||
| - | very similar to adding databases. | ||
| - | |||
| - | Again note that you can manually add the files to your data directory from the command line if | ||
| - | you wish. Seeing they don't need to be formatted, they will be ready to use as soon as they | ||
| - | are placed in the appropriate directory. | ||
| - | |||
| - | === Troubleshooting === | ||
| - | |||
| - | **CHECK THIS FIRST** - If a problem occurred with the entry point script it may have locked the | ||
| - | job cache. Ensure " | ||
| - | remove the " | ||
| - | problems. | ||
| - | |||
| - | Job manager error message: //Could not contact DRM system// - Your scheduler is not | ||
| - | started. If using SGE, you need to start " | ||
| - | " | ||
| - | |||
| - | //Web front not displaying or trying to download pages// - The apache2 configuration isn' | ||
| - | properly set up. Check that the " | ||
| - | available" | ||
| - | installed. Finally, restart apache2 (apache2ctl restart). | ||
| - | |||
| - | //Submitted jobs are not picked up by job manager// - Check first that the job cache files are | ||
| - | being created in " | ||
| - | exist, it is probably a permissions problem. Ensure the apache2 web user has read/ | ||
| - | access to the " | ||
| - | |||
| - | //Filtering jobs not producing results or immediately failing// - Your paths may be set up | ||
| - | wrong in the local configuration file. Look at " | ||
| - | paths are set up correctly. Also, Sun Grid Engine may be failing. Check your | ||
| - | gridengine/ | ||
| - | |||
| - | //Jobs appear to be running but producing no results// - Again probably a permissions | ||
| - | problem. Your web server is writing the job information, | ||
| - | jobs may not have read/write permissions to the scratch folders. | ||
| - | |||
ppp.1284047706.txt.gz · Last modified: by evilliers
