[[mkatari-bioinformatics-august-2013|Back to Manny's Bioinformatics Workshop Home]] ====== Volcano Plot using sample Microarray Data ====== In this exercise we will practice our R-coding skills by analyzing a [[https://docs.google.com/file/d/0B172nc4dAaaONlZ4QXByRThPNXM/edit?usp=sharing|sample microarray dataset]] to create a [[http://en.wikipedia.org/wiki/Volcano_plot_(statistics)|Volcano plot]]. A volcano plot is a great visual aid to identify genes that are differentially expressed. The Log Fold Change is plotted on the x-axis and the -1 * Log base 10 of the p-value from a statistical test is plotted on the y-axis. Significantly differentially expressed genes have a p-value less than a cutoff (usually 0.05) and a log fold change of atleast 1.5x (plus or minus). We will separate the exercise into 5 different steps. - Load the file into the R workspace - Calculate the log fold change for each gene - Calculate the p-value for each gene using t-test - Create the Volcano plot - Create a boxplot of expression values from genes that are significantly induced. ===== Read the file into R workspace ===== expvalues = read.table("expvalues.txt", header=T) ===== Calculate log fold change for each gene ===== ==== Using a loop ==== expvaluesfc = numeric() for (i in 1:nrow(expvalues)) { trtmean = mean(as.numeric(expvalues[i,4:6])) ctrlmean=mean(as.numeric(expvalues[i,1:3])) expvaluesfc[i] = log2(trtmean/ctrlmean) } ===== Calculate p-value using t-test for each gene ===== ==== Using a loop ==== expvaluestt= numeric() for (j in 1:nrow(expvalues)) { t.test(as.numeric(expvalues[j,1:3]), as.numeric(expvalues[j,4:6]), var.equal=TRUE)->sample.tt expvaluestt[j]=sample.tt$p.value } ===== Creating a volcano plot ===== expvalueslog = -1*log(expvaluestt) plot(expvaluesfc, expvalueslog) abline(v=-1, col="blue") abline(v=1, col="blue") abline(h=29.95, col="red") ===== A boxplot of induced genes ===== expvaluefc2tt05 = expvaluesfc > 1 & expvaluestt < 0.05 expvalues[which(expvaluefc2tt05),]->inducedGenes boxplot(inducedGenes)