mkatari-bioinformatics-august-2013-deseq
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| mkatari-bioinformatics-august-2013-deseq [2013/08/23 14:41] – mkatari | mkatari-bioinformatics-august-2013-deseq [2015/08/21 14:13] (current) – mkatari | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| [[mkatari-bioinformatics-august-2013|Back to Manny' | [[mkatari-bioinformatics-august-2013|Back to Manny' | ||
| - | Here we will discuss how to create an R script that can be executed on HPC. Majority of the script | + | Here we will discuss how to create an R script |
| If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | ||
| - | In order to install DESeq type the following: | + | In order to install DESeq type the following: |
| < | < | ||
| source(" | source(" | ||
| biocLite(" | biocLite(" | ||
| - | </ | ||
| - | |||
| - | However for the script that is available on HPC the script will automatically find DESeq. R uses variables to store locations where it should look for packages. Here we can simply add a path to where a specific module is located. This will prevent the need for others to have to install the module themselves. This is done int he following lines of the code | ||
| - | |||
| - | < | ||
| - | mannyLibPaths = '/ | ||
| - | .libPaths(new='/ | ||
| </ | </ | ||
| Line 24: | Line 17: | ||
| </ | </ | ||
| - | Now for our script we will use a command that allows | + | An example of the count data file is provided [[https:// |
| + | First we will load the count data file. | ||
| < | < | ||
| - | userargs | + | counts |
| - | pathToCountsData | + | |
| - | pathToExpDesign = userargs[2] | + | |
| - | pathToOutput = userargs[3] | + | |
| </ | </ | ||
| - | + | Then we will load the experimental design. An example is provided [[https:// | |
| - | The input for DESeq is a matrix/ | + | |
| - | + | ||
| - | You have to first load the file into your workspace. | + | |
| - | + | ||
| - | If you are running it locally | + | |
| < | < | ||
| - | counts | + | expdesign |
| </ | </ | ||
| - | If you are writing a script | ||
| - | counts = read.table(pathToCountsData, | ||
| - | #This is simply meta-data to store information about the samples. | + | The counts that were loaded as a data.frame are now used to create a new type of object: count data set |
| - | #expdesign = data.frame( | + | < |
| - | # row.names=colnames(counts), | + | |
| - | # condition=c(" | + | |
| - | # libType=c(" | + | |
| - | #) | + | |
| - | expdesign = read.table(pathToExpDesign) | + | |
| - | + | ||
| - | #The counts that were loaded as a data.frame are now used to create | + | |
| - | #a new type of object-> count data set | + | |
| cds = newCountDataSet(counts, | cds = newCountDataSet(counts, | ||
| + | </ | ||
| - | #Now we can perform operations on the dataset and save the results in | + | Now we can perform operations on the dataset and save the results in the same object. |
| - | #the same object. | + | |
| - | # | + | First lets estimate the size factor based on the number of aligned reads from each sample. |
| - | #from each sample. | + | < |
| cds = estimateSizeFactors(cds) | cds = estimateSizeFactors(cds) | ||
| + | </ | ||
| - | #to see the size factors: | + | To see the size factors: |
| + | < | ||
| sizeFactors(cds) | sizeFactors(cds) | ||
| + | </ | ||
| - | #To perform a normalization you can simply use this command. | + | To perform a normalization you can simply use this command. Note that the normalized values will not be used for identifying differentially expressed genes but we can use for some downstream analysis. |
| - | #Note that the normalized values will not be used for identifying | + | < |
| - | #differentially expressed genes | + | |
| normalized=counts( cds, normalized=TRUE ) | normalized=counts( cds, normalized=TRUE ) | ||
| + | </ | ||
| - | #An important part of DESeq is to estimate dispersion. This is simply | + | An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes. |
| - | #a form of variance for the genes. | + | < |
| + | # if you have replicates do the following: | ||
| cds = estimateDispersions( cds ) | cds = estimateDispersions( cds ) | ||
| + | ### HOWEVER, If you have NO replicates, then try this | ||
| + | cds = estimateDispersions( cds, method=" | ||
| + | </ | ||
| - | #To visualize the disperson graph | + | To visualize the disperson graph |
| - | pdf(" | + | < |
| + | dispersionFile = " | ||
| + | pdf(dispersionFile) | ||
| plotDispEsts( cds ) | plotDispEsts( cds ) | ||
| dev.off() | dev.off() | ||
| + | </ | ||
| #To see the dispersion values which will be used for the final test | #To see the dispersion values which will be used for the final test | ||
| + | < | ||
| head( fData(cds) ) | head( fData(cds) ) | ||
| + | </ | ||
| - | #Finally to perform the negative binomial test on the dataset to identify | + | Finally to perform the negative binomial test on the dataset to identify differentially expressed genes. |
| - | #differentially expressed genes. | + | < |
| res = nbinomTest( cds, " | res = nbinomTest( cds, " | ||
| + | </ | ||
| - | #An MA plot allows us to see the fold change vs level of expression. | + | An MA plot allows us to see the fold change vs level of expression. In the plot, the red points are for genes that have FDR of 10%. |
| - | #In the plot, the red points are for genes that have FDR of 10%. | + | |
| - | pdf(" | + | < |
| + | maFile = " | ||
| + | pdf(maFile) | ||
| plotMA(res) | plotMA(res) | ||
| dev.off() | dev.off() | ||
| + | </ | ||
| - | #To get the genes that have FDR of 10% | + | #To get the genes that have FDR of 10% and save it in the output directory. |
| - | write.table(resSig, | + | < |
| - | | + | resSigind = res[ which(res$padj < 0.1 & res$log2FoldChange > 1), ] |
| + | resSigrep = res[ which(res$padj < 0.1 & res$log2FoldChange < -1), ] | ||
| + | |||
| + | indoutfile = " | ||
| + | repoutfile = " | ||
| + | |||
| + | write.table(resSigind, | ||
| + | | ||
| sep=" | sep=" | ||
| col.names=T, | col.names=T, | ||
| Line 103: | Line 101: | ||
| quote=F) | quote=F) | ||
| - | #DESeq manual: http://www.bioconductor.org/packages/ | + | write.table(resSigrep, |
| + | repoutfile, | ||
| + | sep=" | ||
| + | col.names=T, | ||
| + | row.names=F, | ||
| + | quote=F) | ||
| + | </code> | ||
| + | |||
mkatari-bioinformatics-august-2013-deseq.1377268875.txt.gz · Last modified: by mkatari
