mkatari-bioinformatics-august-2013-cleansnp
Back to Manny's Bioinformatics Workshop HOME
Clean SNP file
Read the file making sure explicitly tell it to delimit using tab and header is true Remember to save the file as a tab delimited text file.
read.table("Draft sent to Manny.txt", sep="\t", header=T, row.names=1)->draft
To count na use is.na. The number of True can be counted.
apply(is.na(draft), 2, sum) -> draft.snp.na.sum
Identify columns that have ⇐ 7% of missing data
draft[ ,which(draft.snp.na.sum <= 0.07*nrow(draft)) ] -> draft.goodsnps
Do same for genotype
apply(is.na(draft.goodsnps), 1, sum) -> draft.goodsnps.na.sum draft.goodsnps[draft.goodsnps.na.sum<=0.07*ncol(draft.goodsnps),]->draft.goodsnps.goodgen
To remove the regions column and only save the snps.
snponly=draft.goodsnps.goodgen[,2:1259] row.names(snponly)=row.names(draft.goodsnps.goodgen)
Check frequency of the different alleles
table(as.factor(as.character(snponly[,1])))
mkatari-bioinformatics-august-2013-cleansnp.txt · Last modified: 2013/08/19 14:33 by mkatari