tutorials:population-diversity:snp-chips
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
tutorials:population-diversity:snp-chips [2020/09/11 11:11] – [Data analysis workflow with Plink 1.9] bngina | tutorials:population-diversity:snp-chips [2020/09/22 10:09] – [Data analysis workflow with Plink 1.9] bngina | ||
---|---|---|---|
Line 107: | Line 107: | ||
- | ==define | + | We will run all the commands using slurm, so we prepare a script |
+ | We will begin with running basic statistics to explore the dataset, which will enable quality check of data leading to data filtering in particular for poorly genotyped individuals and poorly called marker across the individuals. | ||
+ | ==prepare your script == | ||
+ | < | ||
+ | #!/bin/bash | ||
+ | #SBATCH -p batch | ||
+ | #SBATCH -n 2 | ||
+ | #SBATCH -o / | ||
+ | #SBATCH -e / | ||
+ | #SBATCH -J plink | ||
+ | #load the tool | ||
+ | module load plink/1.9 | ||
+ | |||
+ | #define file path variables | ||
+ | |||
+ | </ | ||
+ | |||
+ | For the input ped and map files, you only need specify the path to the files and give the prefix used to name both files as is the norm, and plink will automatically fill the extension (.ped and .map). | ||
+ | |||
+ | < | ||
+ | in_file='/ | ||
+ | |||
+ | out='/ | ||
+ | |||
+ | #load tools to be used | ||
+ | |||
+ | module load plink/1.9 | ||
+ | </ | ||
+ | |||
+ | |||
+ | It is recommended to compress the files in order to use them with plink, the ped and map files carry a lot information and are quite big, hence we convert them to binary files within plink for faster computation. | ||
+ | |||
+ | < | ||
+ | ########## | ||
+ | |||
+ | #the (--file) tells plink where the file is, it automatically appends the extension) | ||
+ | |||
+ | plink --file ${file} \ | ||
+ | --out ${out}/ | ||
+ | | ||
+ | |||
+ | </ | ||
+ | Above creates three files in the specified output directory '' | ||
+ | |||
+ | *// | ||
+ | *// | ||
+ | | ||
+ | |||
+ | Now we use the created binary files, indicated to plink using '' | ||
+ | |||
+ | -Look a the individuals with missing data and SNPs not typed in all the individuals | ||
+ | |||
+ | < | ||
+ | |||
+ | |||
+ | |||
+ | ######### summary statistics ######## | ||
+ | |||
+ | # | ||
+ | |||
+ | plink --bfile ${out}/ | ||
+ | --out ${out}/ | ||
+ | | ||
+ | |||
+ | </ | ||
+ | |||
+ | This creates two files. | ||
+ | |||
+ | *// | ||
+ | *// | ||
+ | |||
+ | |||
+ | |||
+ | #The missing information found in the '' | ||
+ | < | ||
+ | FID IID MISS_PHENO | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | 10 | ||
+ | |||
+ | </ | ||
+ | |||
+ | The information in each header is as follows; | ||
+ | < | ||
+ | FID Family ID | ||
+ | IID Individual ID | ||
+ | MISS_PHENO | ||
+ | N_MISS | ||
+ | N_GENO | ||
+ | F_MISS | ||
+ | </ | ||
+ | |||
+ | The information found in the '' | ||
+ | < | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | The information in each column is as follows; | ||
+ | < | ||
+ | SNP SNP identifier | ||
+ | CHR Chromosome number | ||
+ | N_MISS | ||
+ | N_GENO | ||
+ | F_MISS | ||
+ | </ | ||
+ | |||
+ | We can generate a file with filters added for the rate missing data in individuals '' | ||
===== Data analysis workflow with R and adegenet ===== | ===== Data analysis workflow with R and adegenet ===== | ||
+ | |||
+ | < | ||
+ | # Working on ped/map files with adegenet | ||
+ | library(adegenet) | ||
+ | library(pegas) | ||
+ | setwd(" | ||
+ | |||
+ | |||
+ | n_markers = 59727 # for now you have to get this number first, which can be easily gotten | ||
+ | # since it is the number of lines in the ped file (do 'wc -l' on that file in the shell) | ||
+ | # We first read the ped file into a genind object: | ||
+ | dat <- read.loci(" | ||
+ | | ||
+ | |||
+ | # the above takes *lots* of time. It wasn't even able to finish in reasonable time on my computer. | ||
+ | # It is much better to first convert the .ped file to a .raw file using Plink, through the following command: | ||
+ | # plink1.9 --ped ' | ||
+ | # and then import the resulting " | ||
+ | |||
+ | read.PLINK(" | ||
+ | # the above is still long, but reasonably so. | ||
+ | |||
+ | save(dat, file = " | ||
+ | # In any case, now we have a compact genlight object, better write it to the disk so that next time we will load it | ||
+ | # straight ahead, skipping the stuff above, and instead of that, next time we will do a simple: | ||
+ | load(" | ||
+ | |||
+ | # summary of the object, checking everything is ok | ||
+ | dat | ||
+ | indNames(dat)[1: | ||
+ | locNames(dat)[1: | ||
+ | |||
+ | </ |
tutorials/population-diversity/snp-chips.txt · Last modified: 2020/09/22 10:21 by bngina