mkatari-bioinformatics-august-2013-clustering
This is an old revision of the document!
Back to Manny's Bioinformatics Workshop HOME
Clustering rna-seq data
continuation from DESeq
Get the significant genes
sigGenes = resSig$id
Get the normalized values for the significant genes
sigGenes.normalized = normalized[sigGenes,]
Correlate the normalized values
sigGenes.normalized.cor = cor(t(sigGenes.normalized), method="pearson")
Convert correlation into distance
sigGenes.normalized.dist = as.dist(1-sigGenes.normalized.cor)
Cluster the genes based on their distance
sigGenes.normalized.hclust = hclust(sigGenes.normalized.dist, method="average")
We expect two different outcomes here so we can simply tell it to use a cutoff that will result in two groups
sigGenes.hclust.k2<-cutree(sigGenes.normalized.hclust, k=2)
Load silhouette library
library(cluster)
Calculate silhouette values
sigGenes.hclust.k2.sil<-silhouette(sigGenes.hclust.k2, sigGenes.normalized.dist)
Plot the silhouette
plot(sigGenes.hclust.k2.sil)
Heatmap
library(gplots)
These functions will make it easy for us to specify how we want the clustering to be performed in the heatmap function
hclust2 <- function(x, method="average", ...) { hclust(x, method=method, ...) } dist2 <- function(x, ...) { as.dist(1-cor(t(x), method="pearson")) }
Create heatmap. We can save it to a pdf file
pdf("heatmap.pdf") heatmap.2(sigGenes.normalized, col=redgreen(75), hclustfun=hclust2, distfun=dist2, scale="row", cexCol=0.6, Colv=TRUE, sepcolor="black", dendrogram="both", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.4) dev.off()
mkatari-bioinformatics-august-2013-clustering.1381325481.txt.gz · Last modified: 2013/10/09 13:31 by mkatari