This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
|
mkatari-bioinformatics-august-2013-deseq [2013/08/23 14:18] mkatari created |
mkatari-bioinformatics-august-2013-deseq [2015/08/21 14:13] (current) mkatari |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | [[mkatari-bioinformatics-august-2013|Manny' | + | [[mkatari-bioinformatics-august-2013|Back to Manny' |
| - | Here we will discuss how to create an R script that can be executed on HPC. Majority of the script | + | Here we will discuss how to create an R script |
| If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | ||
| - | In order to install DESeq type the following: | + | In order to install DESeq type the following: |
| < | < | ||
| Line 11: | Line 11: | ||
| </ | </ | ||
| - | However for the script that is available | + | Regardless of whether you are running |
| < | < | ||
| - | mannyLibPaths = '/ | + | library(DESeq) |
| - | .libPaths(new='/ | + | |
| </ | </ | ||
| - | Regardless | + | An example |
| + | First we will load the count data file. | ||
| < | < | ||
| - | library(DESeq) | + | counts = read.table(" |
| </ | </ | ||
| - | Now for our script | + | Then we will load the experimental design. An example is provided [[https:// |
| < | < | ||
| - | userargs | + | expdesign |
| - | pathToCountsData = userargs[1] | + | |
| - | pathToExpDesign = userargs[2] | + | |
| - | pathToOutput = userargs[3] | + | |
| </ | </ | ||
| - | + | The counts that were loaded as a data.frame are now used to create a new type of object: count data set | |
| - | The input for DESeq is a matrix/ | + | < |
| - | + | ||
| - | #counts = read.table(" | + | |
| - | counts = read.table(pathToCountsData, | + | |
| - | + | ||
| - | #This is simply meta-data to store information about the samples. | + | |
| - | #expdesign = data.frame( | + | |
| - | # row.names=colnames(counts), | + | |
| - | # condition=c(" | + | |
| - | # libType=c(" | + | |
| - | #) | + | |
| - | expdesign = read.table(pathToExpDesign) | + | |
| - | + | ||
| - | #The counts that were loaded as a data.frame are now used to create | + | |
| - | #a new type of object-> count data set | + | |
| cds = newCountDataSet(counts, | cds = newCountDataSet(counts, | ||
| + | </ | ||
| - | #Now we can perform operations on the dataset and save the results in | + | Now we can perform operations on the dataset and save the results in the same object. |
| - | #the same object. | + | |
| - | # | + | First lets estimate the size factor based on the number of aligned reads from each sample. |
| - | #from each sample. | + | < |
| cds = estimateSizeFactors(cds) | cds = estimateSizeFactors(cds) | ||
| + | </ | ||
| - | #to see the size factors: | + | To see the size factors: |
| + | < | ||
| sizeFactors(cds) | sizeFactors(cds) | ||
| + | </ | ||
| - | #To perform a normalization you can simply use this command. | + | To perform a normalization you can simply use this command. Note that the normalized values will not be used for identifying differentially expressed genes but we can use for some downstream analysis. |
| - | #Note that the normalized values will not be used for identifying | + | < |
| - | #differentially expressed genes | + | |
| normalized=counts( cds, normalized=TRUE ) | normalized=counts( cds, normalized=TRUE ) | ||
| + | </ | ||
| - | #An important part of DESeq is to estimate dispersion. This is simply | + | An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes. |
| - | #a form of variance for the genes. | + | < |
| + | # if you have replicates do the following: | ||
| cds = estimateDispersions( cds ) | cds = estimateDispersions( cds ) | ||
| + | ### HOWEVER, If you have NO replicates, then try this | ||
| + | cds = estimateDispersions( cds, method=" | ||
| + | </ | ||
| - | #To visualize the disperson graph | + | To visualize the disperson graph |
| - | pdf(" | + | < |
| + | dispersionFile = " | ||
| + | pdf(dispersionFile) | ||
| plotDispEsts( cds ) | plotDispEsts( cds ) | ||
| dev.off() | dev.off() | ||
| + | </ | ||
| #To see the dispersion values which will be used for the final test | #To see the dispersion values which will be used for the final test | ||
| + | < | ||
| head( fData(cds) ) | head( fData(cds) ) | ||
| + | </ | ||
| - | #Finally to perform the negative binomial test on the dataset to identify | + | Finally to perform the negative binomial test on the dataset to identify differentially expressed genes. |
| - | #differentially expressed genes. | + | < |
| res = nbinomTest( cds, " | res = nbinomTest( cds, " | ||
| + | </ | ||
| - | #An MA plot allows us to see the fold change vs level of expression. | + | An MA plot allows us to see the fold change vs level of expression. In the plot, the red points are for genes that have FDR of 10%. |
| - | #In the plot, the red points are for genes that have FDR of 10%. | + | |
| - | pdf(" | + | < |
| + | maFile = " | ||
| + | pdf(maFile) | ||
| plotMA(res) | plotMA(res) | ||
| dev.off() | dev.off() | ||
| + | </ | ||
| - | #To get the genes that have FDR of 10% | + | #To get the genes that have FDR of 10% and save it in the output directory. |
| - | write.table(resSig, | + | < |
| - | | + | resSigind = res[ which(res$padj < 0.1 & res$log2FoldChange > 1), ] |
| + | resSigrep = res[ which(res$padj < 0.1 & res$log2FoldChange < -1), ] | ||
| + | |||
| + | indoutfile = " | ||
| + | repoutfile = " | ||
| + | |||
| + | write.table(resSigind, | ||
| + | | ||
| sep=" | sep=" | ||
| col.names=T, | col.names=T, | ||
| Line 97: | Line 101: | ||
| quote=F) | quote=F) | ||
| - | #DESeq manual: http://www.bioconductor.org/packages/ | + | write.table(resSigrep, |
| + | repoutfile, | ||
| + | sep=" | ||
| + | col.names=T, | ||
| + | row.names=F, | ||
| + | quote=F) | ||
| + | </code> | ||
| + | |||