mkatari-bioinformatics-august-2013-clustering
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
mkatari-bioinformatics-august-2013-clustering [2014/12/11 14:41] – mkatari | mkatari-bioinformatics-august-2013-clustering [2015/06/17 12:31] – mkatari | ||
---|---|---|---|
Line 4: | Line 4: | ||
====== Clustering rna-seq data ====== | ====== Clustering rna-seq data ====== | ||
continuation from [[mkatari-bioinformatics-august-2013-deseq|DESeq]] | continuation from [[mkatari-bioinformatics-august-2013-deseq|DESeq]] | ||
+ | |||
+ | [[https:// | ||
+ | [[https:// | ||
Get the significant genes | Get the significant genes | ||
Line 70: | Line 73: | ||
< | < | ||
- | sigGenesMean = rowMeans(sigGenes.normalized) | + | # this function takes a vector of gene expression values. |
- | sigGenesSD | + | scaleData <- function(x) { |
+ | x = as.numeric(x) | ||
+ | | ||
+ | sdx = sd(x) | ||
+ | y = (x-meanx)/ | ||
+ | return(y) | ||
+ | } | ||
</ | </ | ||
+ | |||
+ | we need to transpose it because apply function returns the genes as different columns. | ||
+ | |||
+ | < | ||
+ | scaledSigGenes = t(apply(sigGenes.normalized, | ||
+ | colnames(scaledSigGenes)=colnames(sigGenes.normalized) | ||
+ | </ | ||
+ | |||
+ | now to run k-means, in this case we are starting with 2 cluster. | ||
+ | |||
+ | < | ||
+ | SigGenes.kmeans.2 = kmeans(scaledSigGenes, | ||
+ | </ | ||
+ | |||
+ | To obtain the measure of how well the clustering has performed, we can look at the sum of squares between members of the outside group and sum of squares total. Higher the better. | ||
+ | |||
+ | < | ||
+ | SigGenes.kmeans.2$betweenss/ | ||
+ | </ | ||
+ | |||
+ | In order to determine the ideal number of k, we can try many different K's and look to see how well they performed. | ||
+ | |||
+ | < | ||
+ | getBestK <- function(x) { | ||
+ | kmeans_ss=numeric() | ||
+ | kmeans_ss[1]=0 | ||
+ | | ||
+ | for (i in 2:20) { | ||
+ | | ||
+ | # | ||
+ | # | ||
+ | |||
+ | # | ||
+ | | ||
+ | | ||
+ | |||
+ | |||
+ | } | ||
+ | return(kmeans_ss) | ||
+ | } | ||
+ | |||
+ | kmeans_ss=getBestK(scaledSigGenes) | ||
+ | plot(kmeans_ss) | ||
+ | |||
+ | </ | ||
+ | To get the genes in the different clusters | ||
+ | < | ||
+ | SigGenes.kmeans.2.group1 = names(which(SigGenes.kmeans.2$cluster==1)) | ||
+ | SigGenes.kmeans.2.group2 = names(which(SigGenes.kmeans.2$cluster==2)) | ||
+ | </ | ||
+ | |||
+ | |||
+ | The code below plots k-means clustering results. You simply have to provide the k-means output and the labels. | ||
+ | |||
+ | < | ||
+ | plotClusterCenters< | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | mycolors=c(" | ||
+ | centersdim = dim(kmeansres$centers) | ||
+ | plot(kmeansres$centers[1, | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | axis(1, at=c(1: | ||
+ | | ||
+ | for (i in 2: | ||
+ | lines(kmeansres$centers[i, | ||
+ | } | ||
+ | | ||
+ | } | ||
+ | |||
+ | |||
+ | plotClusterCenters(SigGenes.kmeans.2) | ||
+ | </ | ||
+ | |||
====== Heatmap ====== | ====== Heatmap ====== |
mkatari-bioinformatics-august-2013-clustering.txt · Last modified: 2015/06/17 13:26 by mkatari