mkatari-bioinformatics-august-2013-deseq
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
mkatari-bioinformatics-august-2013-deseq [2013/08/23 15:23] – mkatari | mkatari-bioinformatics-august-2013-deseq [2013/08/23 16:20] – mkatari | ||
---|---|---|---|
Line 1: | Line 1: | ||
[[mkatari-bioinformatics-august-2013|Back to Manny' | [[mkatari-bioinformatics-august-2013|Back to Manny' | ||
- | Here we will discuss how to create an R script (DESeq.R) that can be executed on HPC. Majority of the script | + | Here we will discuss how to create an R script (DESeq.R) that can be executed on HPC. This script |
If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | If you are going to run DESeq in R on your desktop you will have to make sure DESeq is already installed. | ||
Line 11: | Line 11: | ||
</ | </ | ||
- | However to make the script | + | Once we have installed |
< | < | ||
Line 24: | Line 24: | ||
</ | </ | ||
- | Now for our script we will use a function (commandArgs) that allows the R to read in arguments from command line automatically. | + | Now in our script we will use a function (commandArgs) that will allow us to read in arguments from command line automatically. |
< | < | ||
Line 33: | Line 33: | ||
</ | </ | ||
- | All the arguments provided in command line will be saved as a character vector | + | Here we are saving all the words as a character vector |
- | The input for DESeq is a matrix/ | + | An example |
- | You have to first load the file into your workspace. | + | First we will load the count data file. |
- | + | ||
- | If you are running it locally | + | |
- | < | + | |
- | counts = read.table(" | + | |
- | </ | + | |
- | If you are writing a script | + | |
< | < | ||
counts = read.table(pathToCountsData, | counts = read.table(pathToCountsData, | ||
</ | </ | ||
- | #This is simply meta-data to store information about the samples. | + | Then we will load the experimental design. An example is provided [[https:// |
- | #expdesign = data.frame( | + | < |
- | # row.names=colnames(counts), | + | |
- | # condition=c(" | + | |
- | # libType=c(" | + | |
- | #) | + | |
expdesign = read.table(pathToExpDesign) | expdesign = read.table(pathToExpDesign) | ||
+ | </ | ||
- | #The counts that were loaded as a data.frame are now used to create | + | The counts that were loaded as a data.frame are now used to create a new type of object: count data set |
- | #a new type of object-> count data set | + | < |
cds = newCountDataSet(counts, | cds = newCountDataSet(counts, | ||
+ | </ | ||
- | #Now we can perform operations on the dataset and save the results in | + | Now we can perform operations on the dataset and save the results in the same object. |
- | #the same object. | + | |
- | # | + | First lets estimate the size factor based on the number of aligned reads from each sample. |
- | #from each sample. | + | < |
cds = estimateSizeFactors(cds) | cds = estimateSizeFactors(cds) | ||
+ | </ | ||
- | #to see the size factors: | + | To see the size factors: |
+ | < | ||
sizeFactors(cds) | sizeFactors(cds) | ||
+ | </ | ||
- | #To perform a normalization you can simply use this command. | + | To perform a normalization you can simply use this command. Note that the normalized values will not be used for identifying differentially expressed genes but we can use for some downstream analysis. |
- | #Note that the normalized values will not be used for identifying | + | < |
- | #differentially expressed genes | + | |
normalized=counts( cds, normalized=TRUE ) | normalized=counts( cds, normalized=TRUE ) | ||
+ | </ | ||
- | #An important part of DESeq is to estimate dispersion. This is simply | + | An important part of DESeq is to estimate dispersion. This is simply a form of variance for the genes. |
- | #a form of variance for the genes. | + | < |
cds = estimateDispersions( cds ) | cds = estimateDispersions( cds ) | ||
+ | </ | ||
- | #To visualize the disperson graph | + | To visualize the disperson graph |
- | pdf(" | + | < |
+ | dispersionFile = paste(pathToOutputDir, | ||
+ | pdf(dispersionFile) | ||
plotDispEsts( cds ) | plotDispEsts( cds ) | ||
dev.off() | dev.off() | ||
+ | </ | ||
#To see the dispersion values which will be used for the final test | #To see the dispersion values which will be used for the final test | ||
+ | < | ||
head( fData(cds) ) | head( fData(cds) ) | ||
+ | </ | ||
- | #Finally to perform the negative binomial test on the dataset to identify | + | Finally to perform the negative binomial test on the dataset to identify differentially expressed genes. |
- | #differentially expressed genes. | + | < |
res = nbinomTest( cds, " | res = nbinomTest( cds, " | ||
+ | </ | ||
- | #An MA plot allows us to see the fold change vs level of expression. | + | An MA plot allows us to see the fold change vs level of expression. In the plot, the red points are for genes that have FDR of 10%. |
- | #In the plot, the red points are for genes that have FDR of 10%. | + | |
+ | < | ||
pdf(" | pdf(" | ||
plotMA(res) | plotMA(res) | ||
dev.off() | dev.off() | ||
+ | </ | ||
#To get the genes that have FDR of 10% | #To get the genes that have FDR of 10% | ||
Line 106: | Line 108: | ||
quote=F) | quote=F) | ||
- | #DESeq manual: http:// |
mkatari-bioinformatics-august-2013-deseq.txt · Last modified: 2015/08/21 14:13 by mkatari