User Tools

Site Tools


mkatari-bioinformatics-august-2013-deseq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
mkatari-bioinformatics-august-2013-deseq [2013/08/23 15:53] mkatarimkatari-bioinformatics-august-2013-deseq [2015/08/21 14:13] (current) mkatari
Line 1: Line 1:
 [[mkatari-bioinformatics-august-2013|Back to Manny's Bioinformatics Workshop HOME]] [[mkatari-bioinformatics-august-2013|Back to Manny's Bioinformatics Workshop HOME]]
  
-Here we will discuss how to create an R script (DESeq.R) that can be executed on HPC. Majority of the script is the same as if you were running it interactively except paths to the files are replaced with variables.+Here we will discuss how to create an R script (DESeq.R) that can be executed on HPC. This script has been adapted from the DESeq manual [[ http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq/inst/doc/DESeq.pdf|DESeq manual]]  except paths to the files are replaced with variables and the plots are being saved as pdf documents in an output directory.
  
 If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed.
Line 9: Line 9:
 source("http://bioconductor.org/biocLite.R") source("http://bioconductor.org/biocLite.R")
 biocLite("DESeq") biocLite("DESeq")
-</code> 
- 
-However to make the script easy to run for anyone on the server, we will tell the R script where exactly to look for DESeq. R uses a variable (.libPaths) to store locations where it should look for packages. We will simply add the path to this variable. This way the person running the script does not need to have DESeq installed in their local R libraries. The other option is to tell the system administrator to add the packages. This is done in the following lines of the code 
- 
-<code> 
-mannyLibPaths = '/home/mkatari/R/x86_64-unknown-linux-gnu-library/3.0' 
-.libPaths(new='/home/mkatari/R/x86_64-unknown-linux-gnu-library/3.0') 
 </code> </code>
  
Line 23: Line 16:
 library(DESeq) library(DESeq)
 </code> </code>
- 
-Now in our script we will use a function (commandArgs) that will allow us to read in arguments from command line automatically. We will use the command Rscript followed by our code (DESeq.R) and the arguments will follow. The code will read one word at a time and save it as a character vector: 
- 
-<code> 
-userargs = commandArgs(TRUE) 
-pathToCountsData = userargs[1] 
-pathToExpDesign = userargs[2] 
-pathToOutput = userargs[3] 
-</code> 
- 
-This will save all the words as a character vector in userargs. The value TRUE in the commandArgs argument make sure only the trailing arguments are saved. If the value is FALSE you will see additional R arguments when the command Rscript is executed. Notice the order of arguments is important. First we will provide the path to the count data file, then the path to the file containing the experimental design and finally the path to the directory where to save the results. 
  
 An example of the count data file is provided [[https://docs.google.com/file/d/0B172nc4dAaaOMG44Zk1BT2NFdkU/edit?usp=sharing|here]] An example of the count data file is provided [[https://docs.google.com/file/d/0B172nc4dAaaOMG44Zk1BT2NFdkU/edit?usp=sharing|here]]
Line 39: Line 21:
 First we will load the count data file. First we will load the count data file.
 <code> <code>
-counts = read.table(pathToCountsData, header=T, row.names=1)+counts = read.table("NextGenRaw.txt", header=T, row.names=1)
 </code> </code>
  
 Then we will load the experimental design. An example is provided [[https://docs.google.com/file/d/0B172nc4dAaaOaE5fTVVhUHJKazg/edit?usp=sharing|here]]: Then we will load the experimental design. An example is provided [[https://docs.google.com/file/d/0B172nc4dAaaOaE5fTVVhUHJKazg/edit?usp=sharing|here]]:
 <code> <code>
-expdesign = read.table(pathToExpDesign)+expdesign = read.table("expdesign.txt")
 </code> </code>
  
Line 71: Line 53:
 An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes. An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes.
 <code> <code>
 +# if you have replicates do the following:
 cds = estimateDispersions( cds ) cds = estimateDispersions( cds )
 +### HOWEVER, If you have NO replicates, then try this
 +cds = estimateDispersions( cds, method="blind" , sharingMode = "fit-only" )
 </code> </code>
  
 To visualize the disperson graph To visualize the disperson graph
 <code> <code>
-dispersionFile = paste(pathToOutputDir, "Dispersion.pdf", sep="")+dispersionFile = "Dispersion.pdf"
 pdf(dispersionFile) pdf(dispersionFile)
 plotDispEsts( cds ) plotDispEsts( cds )
Line 95: Line 80:
  
 <code> <code>
-pdf("MAplot.pdf")+maFile = "MAplot.pdf" 
 +pdf(maFile)
 plotMA(res) plotMA(res)
 dev.off() dev.off()
 </code> </code>
  
-#To get the genes that have FDR of 10% +#To get the genes that have FDR of 10% and save it in the output directory. 
-write.table(resSig,  +<code> 
-            pathToOutput+resSigind = res[ which(res$padj < 0.1 & res$log2FoldChange > 1), ] 
 +resSigrep = res[ which(res$padj < 0.1 & res$log2FoldChange < -1), ] 
 + 
 +indoutfile = "Deseq.indresults.txt"  
 +repoutfile = "Deseq.represults.txt"  
 + 
 +write.table(resSigind,  
 +            indoutfile
             sep="\t",              sep="\t", 
             col.names=T,              col.names=T, 
Line 108: Line 101:
             quote=F)             quote=F)
  
-#DESeq manual: http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq/inst/doc/DESeq.pdf+write.table(resSigrep,  
 +            repoutfile,  
 +            sep="\t",  
 +            col.names=T,  
 +            row.names=F, 
 +            quote=F) 
 +</code> 
 + 
mkatari-bioinformatics-august-2013-deseq.1377273209.txt.gz · Last modified: 2013/08/23 15:53 by mkatari