User Tools

Site Tools


ppp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ppp [2009/09/28 07:48] 172.26.0.166ppp [2019/05/09 09:50] (current) – removed aorth
Line 1: Line 1:
-===== Pathogen Profiling Pipeline ===== 
  
-The Pathogen Profiling Pipeline project aims to develop a metagenomics procedure independent of laboratory cultivation and a flexible bioinformatics pipeline for the rapid identification and analysis of pathogens in samples containing complex mixtures of host and microbial nucleic acids. Sequence reads derived from next generation high throughput DNA sequencing (Roche GS FLX pyrosequencing) technology are passed through customizable metagenomic analysis pipelines, which subsequently filter and report on the best taxonomic hits.   
- 
-Generated raw sequences may be filtered to exclude host and normal flora, thereby facilitating pathogen searching within the large and cumbersome pyrosequencing data sets while retaining specificity.  Researchers can upload any  
-available biological databases for their analysis resulting in potentially limitless configurations for data  
-analysis pipelines.  For maximum throughput within time constraints (such as in emergency response  
-situations), the application may run on a high performance parallel computing cluster to distribute the  
-processor-intensive analyses.  
- 
- 
-The Pathogen Profiling Pipeline facilitates the analysis, reporting, and data management aspects of large-scale pathogen discovery projects aimed at quickly identifying candidate etiological agents in complex nucleic acid mixtures. This outcome has enhanced outbreak preparedness by enhancing capacity for early recognition and containment of pathogens. 
- 
-Developed by: 
- 
-Tom Matthews and Gary Van Domselaar 
- 
-National Microbiology Laboratory 
-Public Health Agency of Canada 
-820 Elgin St., Winnipeg, MB, Canada R3E 3R2 
- 
-''t.c.matthews@gmail.com,  
-gary.vandomselaar@gmail.com'' 
- 
-===== Installation ===== 
- 
-After a [[[upgrading_rocks|fresh installation]] of Rocks 5.2 on the HPC cluster.  Download PPP: http://www.corefacility.ca/ppp 
- 
-From the README in ppp.tar.gz: 
-<file>SOFTWARE REQUIREMENTS 
- 
-    Compute cluster: 
-        - BLAST 
-        - BioPerl -- 1.5 or newer 
-        - DRMAA compliant scheduler -- Sun Grid Engine suggested 
-    Web server: 
-        - Apache2 
-        - Mod-Perl 
-        - BioPerl -- 1.5 or newer 
-        - Graphviz</file> 
- 
-==== On the head node ==== 
-The installation and configuration of the head node should have taken care of the Apache2, mod_perl, and BioPerl requirements.  See the [[upgrading_rocks]] page if you haven't satisfied those yet. 
- 
-We need ''graphviz'', Rocks installs a copy, but it's located in the Rocks special directories.  Install another copy in the system by following the instructions on their website for CentOS/RedHat Enterprise Linux: http://www.graphviz.org/Download_linux_rhel.php 
-  - Download ''graphviz-rhel.repo'' and copy it to ''/etc/yum.repos.d/'' 
-  - ''yum install 'graphviz*''' 
- 
-=== Configuring perl modules === 
- 
-PPP's web interface needs XML::Simple, which is in yum: 
-<code># yum install perl-XML-Simple</code> 
- 
-=== PPP's DRMAA scheduler === 
-[[http://en.wikipedia.org/wiki/DRMAA|DRMAA]] is an API for job scheduling.  Sun Grid Engine is DRMAA compliant, but it needs the help of a perl module.  We will compile from source because we need to tell it where to look to find the C headers for SGE's drmaa support. 
- 
-  - Download Schedule/DRMAAc from CPAN: http://search.cpan.org/CPAN/authors/id/T/TH/THARSCH/Schedule-DRMAAc-0.81.tar.gz 
-  - Read the README :) 
-  - Prepare the environment for compiling the perl module: 
-<code>$ source /opt/gridengine/default/common/settings.sh 
-$ export LD_LIBRARY_PATH=$SGE_ROOT/lib/`$SGE_ROOT/util/arch` 
-$ ln -s $SGE_ROOT/include/drmaa.h</code> 
-Build and install the perl module: 
-<code>perl Makefile.PL 
-make 
-make test 
-sudo make install</code> 
- 
-=== Install PPP === 
-PPP's perl scripts need to be accessible to all nodes, so change directory to somewhere accessible to the nodes: 
-<code># cd /mnt/export3</code> 
-Unzip PPP: 
-<code># tar -zxf ~alan/src/ppp.tar.gz</code> 
-Rename so its less confusing: 
-<code># mv ppp PathogenPP</code> 
-Now read the readme and install as per install instructions in INSTALL.PDF... In a nutshell: 
-<code># cd ppp-backend 
-# mkdir db scratch data</code> 
-Edit the config file (''conf/local.conf'') to reflect the locations of software in your installation, most importantly: 
-<file>#blast_loc 
-/opt/Bio/ncbi/bin/blastall 
- 
-#formatdb_loc 
-/opt/Bio/ncbi/bin/formatdb 
- 
-#bp_index_loc 
-/usr/bin/bp_index.pl 
- 
-#rootPath 
-/mnt/export3/PathogenPP/ppp-backend/</file> 
- 
-Run ''bin/customjob.pl'', if there are no errors you can save that output into the ''conf/customjob.xml'' file: 
-<code># perl bin/customjob.pl > conf/customjob.xml</code> 
-Copy the ''ppp-web'' folder to Apache's document root: 
-<code># cp -R ppp-web/ /var/www/html/</code> 
-Create a link to the ppp-backend directory in ppp-web: 
-<code># ln -s /mnt/export3/PathogenPP/ppp-backend /var/www/html/ppp-web/ppp</code> 
-Edit Apache's config file to load ppp as perl scripts.  Create a new file ''/etc/httpd/conf.d/ppp-web.conf'': 
-<file><IfModule mod_perl.c> 
-        <Directory "/var/www/html/ppp-web"> 
-                AllowOverride None 
-                Order allow,deny 
-                allow from all 
-                AddHandler perl-script cgi-script .cgi .pl 
-                Options None 
-        </Directory> 
- 
-        <Directory "/var/www/html/ppp-web/cgi-bin"> 
-                AllowOverride None 
-                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch 
-                Order allow,deny 
-                Allow from all 
-                SetHandler perl-script 
-                PerlResponseHandler ModPerl::Registry 
-        </Directory> 
-</IfModule></file> 
- 
-Change the permissions on everything so that Apache's user can read/write: 
- 
-Start PPP's drmaa job server... 
-<code># perl drmaamanager.pl 
-> Job manager initilized...</code> 
-Now PPP's web interface should indicate that there is a job server running (green circle!) 
- 
-==== On a compute node ==== 
-BioPerl is required on the compute nodes, but because CPAN is ugly and BioPerl requires so many dependencies it makes it hard to do via a batch job.  We'll configure and install it on one node, then copy the working installation to the other nodes (they are the same OS and hardware, so there is no problem). 
- 
-Manually tell CPAN to follow module dependencies during installation: 
-<code># perl -MCPAN -e shell 
-cpan> o conf prerequisites_policy follow 
-cpan> o conf commit</code> 
- 
- 
- 
-From the node Bio::Perl is installed on: 
-$ sudo cp -R /usr/lib/perl5/ /mnt/export/perl5 
-From another node: 
- 
-$ sudo rsync -av --delete /mnt/export/perl5 /usr/lib/ 
-$ cat bioperltest.pl 
-#!/usr/bin/perl 
- 
-use Bio::Perl; 
- 
-exit;</code> 
-perl bioperltest.pl</code> 
-If that worked, then Bio::Perl is installed fine on the node, so run the same thing on all the nodes (as root): 
-<code># rocks run host 'rsync -av --delete /mnt/export/perl5 /usr/lib'</code> 
-Now test the perl script on all the nodes.  If it is successful you should see no output: 
-<code># rocks run host 'perl ~alan/bioperltest.pl'</code> 
-Delete the temp perl directory, as it is not needed anymore: 
-# rm -rf /mnt/export/perl5/ 
ppp.1254124130.txt.gz · Last modified: (external edit)