User Tools

Site Tools


mkatari-bioinformatics-august-2013-deseq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
mkatari-bioinformatics-august-2013-deseq [2013/08/23 16:26] mkatarimkatari-bioinformatics-august-2013-deseq [2015/08/21 14:13] (current) mkatari
Line 9: Line 9:
 source("http://bioconductor.org/biocLite.R") source("http://bioconductor.org/biocLite.R")
 biocLite("DESeq") biocLite("DESeq")
-</code> 
- 
-Once we have installed the DESeq package in our local R repository we need to let our script know to look in this directory so if anyone wants to run our script, it will know exactly where to look for DESeq. R uses a variable (.libPaths) to store locations where it should look for packages. By default R will look in the global R library and the user's local path (if one exists). We will simply add our R library path to this variable. The other option is to tell the system administrator to add the packages to the global library path. This is done in the following lines of the code: 
- 
-<code> 
-mannyLibPaths = '/home/mkatari/R/x86_64-unknown-linux-gnu-library/3.0' 
-.libPaths(new='/home/mkatari/R/x86_64-unknown-linux-gnu-library/3.0') 
 </code> </code>
  
Line 23: Line 16:
 library(DESeq) library(DESeq)
 </code> </code>
- 
-Now in our script we will use a function (commandArgs) that will allow us to read in arguments from command line automatically. In order to run our script the user will simply call our script using Rscript followed by our script (DESeq.R) and the arguments. The code will read in all the words that follow our script name one word at a time and save it as a character vector: 
- 
-<code> 
-userargs = commandArgs(TRUE) 
-pathToCountsData = userargs[1] 
-pathToExpDesign = userargs[2] 
-pathToOutput = userargs[3] 
-</code> 
- 
-Here we are saving all the words as a character vector called userargs. The value TRUE in the commandArgs argument is to make sure only the trailing arguments are saved. If the value is FALSE you will see additional R arguments when the command Rscript is executed. Notice the order of arguments is important. First we will provide the path to the count data file, then the path to the file containing the experimental design and finally the path to the directory where to save the results (The directory must contain a trailing /. 
  
 An example of the count data file is provided [[https://docs.google.com/file/d/0B172nc4dAaaOMG44Zk1BT2NFdkU/edit?usp=sharing|here]] An example of the count data file is provided [[https://docs.google.com/file/d/0B172nc4dAaaOMG44Zk1BT2NFdkU/edit?usp=sharing|here]]
Line 39: Line 21:
 First we will load the count data file. First we will load the count data file.
 <code> <code>
-counts = read.table(pathToCountsData, header=T, row.names=1)+counts = read.table("NextGenRaw.txt", header=T, row.names=1)
 </code> </code>
  
 Then we will load the experimental design. An example is provided [[https://docs.google.com/file/d/0B172nc4dAaaOaE5fTVVhUHJKazg/edit?usp=sharing|here]]: Then we will load the experimental design. An example is provided [[https://docs.google.com/file/d/0B172nc4dAaaOaE5fTVVhUHJKazg/edit?usp=sharing|here]]:
 <code> <code>
-expdesign = read.table(pathToExpDesign)+expdesign = read.table("expdesign.txt")
 </code> </code>
  
Line 71: Line 53:
 An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes. An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes.
 <code> <code>
 +# if you have replicates do the following:
 cds = estimateDispersions( cds ) cds = estimateDispersions( cds )
 +### HOWEVER, If you have NO replicates, then try this
 +cds = estimateDispersions( cds, method="blind" , sharingMode = "fit-only" )
 </code> </code>
  
 To visualize the disperson graph To visualize the disperson graph
 <code> <code>
-dispersionFile = paste(pathToOutputDir, "Dispersion.pdf", sep="")+dispersionFile = "Dispersion.pdf"
 pdf(dispersionFile) pdf(dispersionFile)
 plotDispEsts( cds ) plotDispEsts( cds )
Line 95: Line 80:
  
 <code> <code>
-maFile = paste(pathToOutputDir, "MAplot.pdf", sep="")+maFile = "MAplot.pdf"
 pdf(maFile) pdf(maFile)
 plotMA(res) plotMA(res)
Line 103: Line 88:
 #To get the genes that have FDR of 10% and save it in the output directory. #To get the genes that have FDR of 10% and save it in the output directory.
 <code> <code>
-resSig = res[ which(res$padj < 0.1), ]+resSigind = res[ which(res$padj < 0.1 & res$log2FoldChange > 1), ] 
 +resSigrep = res[ which(res$padj < 0.1 & res$log2FoldChange < -1), ]
  
-outfile paste(pathToOutputDir,"Deseq.results.txt", sep=""+indoutfile = "Deseq.indresults.txt"  
 +repoutfile = "Deseq.represults.txt
  
-write.table(resSig,  +write.table(resSigind,  
-            outfile+            indoutfile,  
 +            sep="\t",  
 +            col.names=T,  
 +            row.names=F, 
 +            quote=F) 
 + 
 +write.table(resSigrep,  
 +            repoutfile
             sep="\t",              sep="\t", 
             col.names=T,              col.names=T, 
Line 115: Line 109:
 </code> </code>
  
-Now the beauty of the script is that a user does not need to start R or even know how to use R to run the script. Simply call the script as any command and the results will be saved in an output directory. This is also great way to build a workflow of several R functions. The command to run the script is shown below. Feel free to put it in your sbatch file. 
  
-<code> 
-/export/apps/R/3.0.0/bin/Rscript /home/mkatari/Rscripts/DESeq.R /home/mkatari/testDeseq/NextGenRaw.txt /home/mkatari/testDeseq/expdesign.txt /home/mkatari/testDeseq/ 
-</code> 
mkatari-bioinformatics-august-2013-deseq.1377275219.txt.gz · Last modified: 2013/08/23 16:26 by mkatari