mkatari-bioinformatics-august-2013-deseq
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
mkatari-bioinformatics-august-2013-deseq [2013/08/23 14:18] – created mkatari | mkatari-bioinformatics-august-2013-deseq [2015/08/21 14:13] (current) – mkatari | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | [[mkatari-bioinformatics-august-2013|Manny' | + | [[mkatari-bioinformatics-august-2013|Back to Manny' |
- | Here we will discuss how to create an R script that can be executed on HPC. Majority of the script | + | Here we will discuss how to create an R script |
If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | ||
- | In order to install DESeq type the following: | + | In order to install DESeq type the following: |
< | < | ||
Line 11: | Line 11: | ||
</ | </ | ||
- | However for the script that is available | + | Regardless of whether you are running |
< | < | ||
- | mannyLibPaths = '/ | + | library(DESeq) |
- | .libPaths(new='/ | + | |
</ | </ | ||
- | Regardless | + | An example |
+ | First we will load the count data file. | ||
< | < | ||
- | library(DESeq) | + | counts = read.table(" |
</ | </ | ||
- | Now for our script | + | Then we will load the experimental design. An example is provided [[https:// |
< | < | ||
- | userargs | + | expdesign |
- | pathToCountsData = userargs[1] | + | |
- | pathToExpDesign = userargs[2] | + | |
- | pathToOutput = userargs[3] | + | |
</ | </ | ||
- | + | The counts that were loaded as a data.frame are now used to create a new type of object: count data set | |
- | The input for DESeq is a matrix/ | + | < |
- | + | ||
- | #counts = read.table(" | + | |
- | counts = read.table(pathToCountsData, | + | |
- | + | ||
- | #This is simply meta-data to store information about the samples. | + | |
- | #expdesign = data.frame( | + | |
- | # row.names=colnames(counts), | + | |
- | # condition=c(" | + | |
- | # libType=c(" | + | |
- | #) | + | |
- | expdesign = read.table(pathToExpDesign) | + | |
- | + | ||
- | #The counts that were loaded as a data.frame are now used to create | + | |
- | #a new type of object-> count data set | + | |
cds = newCountDataSet(counts, | cds = newCountDataSet(counts, | ||
+ | </ | ||
- | #Now we can perform operations on the dataset and save the results in | + | Now we can perform operations on the dataset and save the results in the same object. |
- | #the same object. | + | |
- | # | + | First lets estimate the size factor based on the number of aligned reads from each sample. |
- | #from each sample. | + | < |
cds = estimateSizeFactors(cds) | cds = estimateSizeFactors(cds) | ||
+ | </ | ||
- | #to see the size factors: | + | To see the size factors: |
+ | < | ||
sizeFactors(cds) | sizeFactors(cds) | ||
+ | </ | ||
- | #To perform a normalization you can simply use this command. | + | To perform a normalization you can simply use this command. Note that the normalized values will not be used for identifying differentially expressed genes but we can use for some downstream analysis. |
- | #Note that the normalized values will not be used for identifying | + | < |
- | #differentially expressed genes | + | |
normalized=counts( cds, normalized=TRUE ) | normalized=counts( cds, normalized=TRUE ) | ||
+ | </ | ||
- | #An important part of DESeq is to estimate dispersion. This is simply | + | An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes. |
- | #a form of variance for the genes. | + | < |
+ | # if you have replicates do the following: | ||
cds = estimateDispersions( cds ) | cds = estimateDispersions( cds ) | ||
+ | ### HOWEVER, If you have NO replicates, then try this | ||
+ | cds = estimateDispersions( cds, method=" | ||
+ | </ | ||
- | #To visualize the disperson graph | + | To visualize the disperson graph |
- | pdf(" | + | < |
+ | dispersionFile = " | ||
+ | pdf(dispersionFile) | ||
plotDispEsts( cds ) | plotDispEsts( cds ) | ||
dev.off() | dev.off() | ||
+ | </ | ||
#To see the dispersion values which will be used for the final test | #To see the dispersion values which will be used for the final test | ||
+ | < | ||
head( fData(cds) ) | head( fData(cds) ) | ||
+ | </ | ||
- | #Finally to perform the negative binomial test on the dataset to identify | + | Finally to perform the negative binomial test on the dataset to identify differentially expressed genes. |
- | #differentially expressed genes. | + | < |
res = nbinomTest( cds, " | res = nbinomTest( cds, " | ||
+ | </ | ||
- | #An MA plot allows us to see the fold change vs level of expression. | + | An MA plot allows us to see the fold change vs level of expression. In the plot, the red points are for genes that have FDR of 10%. |
- | #In the plot, the red points are for genes that have FDR of 10%. | + | |
- | pdf(" | + | < |
+ | maFile = " | ||
+ | pdf(maFile) | ||
plotMA(res) | plotMA(res) | ||
dev.off() | dev.off() | ||
+ | </ | ||
- | #To get the genes that have FDR of 10% | + | #To get the genes that have FDR of 10% and save it in the output directory. |
- | write.table(resSig, | + | < |
- | | + | resSigind = res[ which(res$padj < 0.1 & res$log2FoldChange > 1), ] |
+ | resSigrep = res[ which(res$padj < 0.1 & res$log2FoldChange < -1), ] | ||
+ | |||
+ | indoutfile = " | ||
+ | repoutfile = " | ||
+ | |||
+ | write.table(resSigind, | ||
+ | | ||
sep=" | sep=" | ||
col.names=T, | col.names=T, | ||
Line 97: | Line 101: | ||
quote=F) | quote=F) | ||
- | #DESeq manual: http://www.bioconductor.org/packages/ | + | write.table(resSigrep, |
+ | repoutfile, | ||
+ | sep=" | ||
+ | col.names=T, | ||
+ | row.names=F, | ||
+ | quote=F) | ||
+ | </code> | ||
+ |
mkatari-bioinformatics-august-2013-deseq.1377267488.txt.gz · Last modified: 2013/08/23 14:18 by mkatari