mkatari-bioinformatics-august-2013-deseq
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
mkatari-bioinformatics-august-2013-deseq [2013/08/23 15:46] – mkatari | mkatari-bioinformatics-august-2013-deseq [2015/06/17 06:14] – mkatari | ||
---|---|---|---|
Line 1: | Line 1: | ||
[[mkatari-bioinformatics-august-2013|Back to Manny' | [[mkatari-bioinformatics-august-2013|Back to Manny' | ||
- | Here we will discuss how to create an R script (DESeq.R) that can be executed on HPC. Majority of the script | + | Here we will discuss how to create an R script (DESeq.R) that can be executed on HPC. This script |
If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | ||
Line 9: | Line 9: | ||
source(" | source(" | ||
biocLite(" | biocLite(" | ||
- | </ | ||
- | |||
- | However to make the script easy to run for anyone on the server, we will tell the R script where exactly to look for DESeq. R uses a variable (.libPaths) to store locations where it should look for packages. We will simply add the path to this variable. This way the person running the script does not need to have DESeq installed in their local R libraries. The other option is to tell the system administrator to add the packages. This is done in the following lines of the code | ||
- | |||
- | < | ||
- | mannyLibPaths = '/ | ||
- | .libPaths(new='/ | ||
</ | </ | ||
Line 24: | Line 17: | ||
</ | </ | ||
- | Now in our script we will use a function (commandArgs) that will allow us to read in arguments from command line automatically. | + | Now in our script we will use a function (commandArgs) that will allow us to read in arguments from command line automatically. |
< | < | ||
Line 33: | Line 26: | ||
</ | </ | ||
- | This will save all the words as a character vector | + | Here we are saving |
An example of the count data file is provided [[https:// | An example of the count data file is provided [[https:// | ||
Line 42: | Line 35: | ||
</ | </ | ||
- | #This is simply meta-data to store information about the samples. | + | Then we will load the experimental design. An example is provided [[https:// |
- | #expdesign = data.frame( | + | < |
- | # row.names=colnames(counts), | + | |
- | # condition=c(" | + | |
- | # libType=c(" | + | |
- | #) | + | |
expdesign = read.table(pathToExpDesign) | expdesign = read.table(pathToExpDesign) | ||
+ | </ | ||
- | #The counts that were loaded as a data.frame are now used to create | + | The counts that were loaded as a data.frame are now used to create a new type of object: count data set |
- | #a new type of object-> count data set | + | < |
cds = newCountDataSet(counts, | cds = newCountDataSet(counts, | ||
+ | </ | ||
- | #Now we can perform operations on the dataset and save the results in | + | Now we can perform operations on the dataset and save the results in the same object. |
- | #the same object. | + | |
- | # | + | First lets estimate the size factor based on the number of aligned reads from each sample. |
- | #from each sample. | + | < |
cds = estimateSizeFactors(cds) | cds = estimateSizeFactors(cds) | ||
+ | </ | ||
- | #to see the size factors: | + | To see the size factors: |
+ | < | ||
sizeFactors(cds) | sizeFactors(cds) | ||
+ | </ | ||
- | #To perform a normalization you can simply use this command. | + | To perform a normalization you can simply use this command. Note that the normalized values will not be used for identifying differentially expressed genes but we can use for some downstream analysis. |
- | #Note that the normalized values will not be used for identifying | + | < |
- | #differentially expressed genes | + | |
normalized=counts( cds, normalized=TRUE ) | normalized=counts( cds, normalized=TRUE ) | ||
+ | </ | ||
- | #An important part of DESeq is to estimate dispersion. This is simply | + | An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes. |
- | #a form of variance for the genes. | + | < |
cds = estimateDispersions( cds ) | cds = estimateDispersions( cds ) | ||
+ | </ | ||
- | #To visualize the disperson graph | + | To visualize the disperson graph |
- | pdf(" | + | < |
+ | dispersionFile = paste(pathToOutputDir, | ||
+ | pdf(dispersionFile) | ||
plotDispEsts( cds ) | plotDispEsts( cds ) | ||
dev.off() | dev.off() | ||
+ | </ | ||
#To see the dispersion values which will be used for the final test | #To see the dispersion values which will be used for the final test | ||
+ | < | ||
head( fData(cds) ) | head( fData(cds) ) | ||
+ | </ | ||
- | #Finally to perform the negative binomial test on the dataset to identify | + | Finally to perform the negative binomial test on the dataset to identify differentially expressed genes. |
- | #differentially expressed genes. | + | < |
res = nbinomTest( cds, " | res = nbinomTest( cds, " | ||
+ | </ | ||
- | #An MA plot allows us to see the fold change vs level of expression. | + | An MA plot allows us to see the fold change vs level of expression. In the plot, the red points are for genes that have FDR of 10%. |
- | #In the plot, the red points are for genes that have FDR of 10%. | + | |
- | pdf(" | + | < |
+ | maFile = paste(pathToOutputDir, | ||
+ | pdf(maFile) | ||
plotMA(res) | plotMA(res) | ||
dev.off() | dev.off() | ||
+ | </ | ||
+ | |||
+ | #To get the genes that have FDR of 10% and save it in the output directory. | ||
+ | < | ||
+ | resSig = res[ which(res$padj < 0.1), ] | ||
+ | |||
+ | outfile = paste(pathToOutputDir," | ||
- | #To get the genes that have FDR of 10% | ||
write.table(resSig, | write.table(resSig, | ||
- | | + | |
sep=" | sep=" | ||
col.names=T, | col.names=T, | ||
row.names=F, | row.names=F, | ||
quote=F) | quote=F) | ||
+ | </ | ||
- | #DESeq manual: http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq/inst/doc/DESeq.pdf | + | Now the beauty of the script is that a user does not need to start R or even know how to use R to run the script. Simply call the script as any command and the results will be saved in an output directory. This is also great way to build a workflow of several R functions. The command to run the script is shown below. Feel free to put it in your sbatch file. |
+ | |||
+ | < | ||
+ | /export/apps/R/3.0.0/bin/Rscript | ||
+ | </ |
mkatari-bioinformatics-august-2013-deseq.txt · Last modified: 2015/08/21 14:13 by mkatari