mkatari-bioinformatics-august-2013-clustering
This is an old revision of the document!
Back to Manny's Bioinformatics Workshop HOME
Clustering rna-seq data
continuation from DESeq
Get the significant genes
sigGenes = resSig$id
Get the normalized values for the significant genes
sigGenes.normalized = normalized[sigGenes,]
Correlate the normalized values
sigGenes.normalized.cor = cor(t(sigGenes.normalized),
method="pearson")
Convert correlation into distance
sigGenes.normalized.dist = as.dist(1-sigGenes.normalized.cor)
Cluster the genes based on their distance
sigGenes.normalized.hclust = hclust(sigGenes.normalized.dist,
method="average")
We expect two different outcomes here so we can simply tell it to use a cutoff that will result in two groups
sigGenes.hclust.k2<-cutree(sigGenes.normalized.hclust, k=2)
Load silhouette library
library(cluster)
Calculate silhouette values
sigGenes.hclust.k2.sil<-silhouette(sigGenes.hclust.k2,
sigGenes.normalized.dist)
Plot the silhouette
plot(sigGenes.hclust.k2.sil)
Heatmap
library(gplots)
These functions will make it easy for us to specify how we want the clustering to be performed in the heatmap function
hclust2 <- function(x, method="average", ...) {
hclust(x, method=method, ...)
}
dist2 <- function(x, ...) {
as.dist(1-cor(t(x), method="pearson"))
}
Create heatmap. We can save it to a pdf file
pdf("heatmap.pdf")
heatmap.2(sigGenes.normalized,
col=redgreen(75),
hclustfun=hclust2,
distfun=dist2,
scale="row",
cexCol=0.6,
Colv=TRUE,
sepcolor="black",
dendrogram="both",
key=TRUE,
symkey=FALSE,
density.info="none",
trace="none",
cexRow=0.4)
dev.off()
mkatari-bioinformatics-august-2013-clustering.1381325481.txt.gz · Last modified: by mkatari
