Clustering rna-seq data

continuation from DESeq

Get the significant genes

sigGenes = resSig$id

Get the normalized values for the significant genes

sigGenes.normalized = normalized[sigGenes,]

Correlate the normalized values

sigGenes.normalized.cor = cor(t(sigGenes.normalized), 
                              method="pearson")

Convert correlation into distance

sigGenes.normalized.dist = as.dist(1-sigGenes.normalized.cor)

Cluster the genes based on their distance

sigGenes.normalized.hclust = hclust(sigGenes.normalized.dist, 
                                    method="average")

We expect two different outcomes here so we can simply tell it to use a cutoff that will result in two groups

sigGenes.hclust.k2<-cutree(sigGenes.normalized.hclust, k=2)

Load silhouette library

library(cluster)

Calculate silhouette values

sigGenes.hclust.k2.sil<-silhouette(sigGenes.hclust.k2, 
                                   sigGenes.normalized.dist)

Plot the silhouette

plot(sigGenes.hclust.k2.sil)

Heatmap

library(gplots)

These functions will make it easy for us to specify how we want the clustering to be performed in the heatmap function

hclust2 <- function(x, method="average", ...) {
  hclust(x, method=method, ...)
}

dist2 <- function(x, ...) {
  as.dist(1-cor(t(x), method="pearson"))  			
}

Create heatmap. We can save it to a pdf file

pdf("heatmap.pdf")
heatmap.2(sigGenes.normalized, 
          col=redgreen(75),
          hclustfun=hclust2,
          distfun=dist2,
          scale="row", 
          cexCol=0.6, 
          Colv=TRUE,
          sepcolor="black",
          dendrogram="both",
          key=TRUE, 
          symkey=FALSE, 
          density.info="none", 
          trace="none", 
          cexRow=0.4)
dev.off()