This is an old revision of the document!
Table of Contents
Back to Manny's Bioinformatics Workshop HOME
Clustering rna-seq data
continuation from DESeq
Get the significant genes
sigGenes = resSig$id
Get the normalized values for the significant genes
sigGenes.normalized = normalized[sigGenes,]
Correlate the normalized values
sigGenes.normalized.cor = cor(t(sigGenes.normalized), method="pearson")
Convert correlation into distance
sigGenes.normalized.dist = as.dist(1-sigGenes.normalized.cor)
Cluster the genes based on their distance
sigGenes.normalized.hclust = hclust(sigGenes.normalized.dist, method="average")
We expect two different outcomes here so we can simply tell it to use a cutoff that will result in two groups
sigGenes.hclust.k2<-cutree(sigGenes.normalized.hclust, k=2)
Now to get all the genes that are in cluster 2 simply type.
hclust.k2.cluster2=names(which(sigGenes.hclust.k2==2))
Now we can create a new matrix/data frame with just these genes. This new matrix can be used to plot a heatmap to make it easier to see a expression profile of the cluster (see below).
hclust.k2.cluster2.normalized = sigGenes.normalized[hclust.k2.cluster2,]
Load silhouette library
library(cluster)
Calculate silhouette values
sigGenes.hclust.k2.sil<-silhouette(sigGenes.hclust.k2, sigGenes.normalized.dist)
Plot the silhouette
plot(sigGenes.hclust.k2.sil)
K-means
Heatmap
library(gplots)
These functions will make it easy for us to specify how we want the clustering to be performed in the heatmap function
hclust2 <- function(x, method="average", ...) { hclust(x, method=method, ...) } dist2 <- function(x, ...) { as.dist(1-cor(t(x), method="pearson")) }
Create heatmap. We can save it to a pdf file. Note that sigGenes.normalized is just a matrix. Here we can provide any matrix of values, for example hclust.k2.cluster2.normalized which is the expression values of genes in cluster 2 (see above)
pdf("heatmap.pdf") heatmap.2(sigGenes.normalized, col=redgreen(75), hclustfun=hclust2, distfun=dist2, scale="row", cexCol=0.6, Colv=TRUE, sepcolor="black", dendrogram="both", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.4) dev.off()