mkatari-bioinformatics-august-2013-clustering
                This is an old revision of the document!
Back to Manny's Bioinformatics Workshop HOME
Clustering rna-seq data
continuation from DESeq
Get the significant genes
sigGenes = resSig$id
Get the normalized values for the significant genes
sigGenes.normalized = normalized[sigGenes,]
Correlate the normalized values
sigGenes.normalized.cor = cor(t(sigGenes.normalized), 
                              method="pearson")
Convert correlation into distance
sigGenes.normalized.dist = as.dist(1-sigGenes.normalized.cor)
Cluster the genes based on their distance
sigGenes.normalized.hclust = hclust(sigGenes.normalized.dist, 
                                    method="average")
We expect two different outcomes here so we can simply tell it to use a cutoff that will result in two groups
sigGenes.hclust.k2<-cutree(sigGenes.normalized.hclust, k=2)
Load silhouette library
library(cluster)
Calculate silhouette values
sigGenes.hclust.k2.sil<-silhouette(sigGenes.hclust.k2, 
                                   sigGenes.normalized.dist)
Plot the silhouette
plot(sigGenes.hclust.k2.sil)
Heatmap
library(gplots)
These functions will make it easy for us to specify how we want the clustering to be performed in the heatmap function
hclust2 <- function(x, method="average", ...) {
  hclust(x, method=method, ...)
}
dist2 <- function(x, ...) {
  as.dist(1-cor(t(x), method="pearson"))  			
}
Create heatmap. We can save it to a pdf file
pdf("heatmap.pdf")
heatmap.2(sigGenes.normalized, 
          col=redgreen(75),
          hclustfun=hclust2,
          distfun=dist2,
          scale="row", 
          cexCol=0.6, 
          Colv=TRUE,
          sepcolor="black",
          dendrogram="both",
          key=TRUE, 
          symkey=FALSE, 
          density.info="none", 
          trace="none", 
          cexRow=0.4)
dev.off()
mkatari-bioinformatics-august-2013-clustering.1381325481.txt.gz · Last modified:  by mkatari
                
                