Back to Manny's Bioinformatics Workshop HOME

Clean SNP file

Read the file making sure explicitly tell it to delimit using tab and header is true Remember to save the file as a tab delimited text file.

read.table("Draft sent to Manny.txt", sep="\t", header=T, row.names=1)->draft

To count na use is.na. The number of True can be counted.

apply(is.na(draft), 2, sum) -> draft.snp.na.sum

Identify columns that have ⇐ 7% of missing data

draft[ ,which(draft.snp.na.sum <= 0.07*nrow(draft)) ] -> draft.goodsnps

Do same for genotype

apply(is.na(draft.goodsnps), 1, sum) -> draft.goodsnps.na.sum
draft.goodsnps[draft.goodsnps.na.sum<=0.07*ncol(draft.goodsnps),]->draft.goodsnps.goodgen

To remove the regions column and only save the snps.

snponly=draft.goodsnps.goodgen[,2:1259]
row.names(snponly)=row.names(draft.goodsnps.goodgen)

Check frequency of the different alleles

table(as.factor(as.character(snponly[,1])))