tutorials:population-diversity:snp-chips
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
tutorials:population-diversity:snp-chips [2020/09/17 15:22] – [Data analysis workflow with Plink 1.9] bngina | tutorials:population-diversity:snp-chips [2020/09/22 10:21] (current) – [Data analysis workflow with Plink 1.9] bngina | ||
---|---|---|---|
Line 127: | Line 127: | ||
#define file path variables | #define file path variables | ||
- | #for the input ped and map files, you only need specify the path to the files and give the prefix used to name both files as is the norm, and plink will automatically fill the extension (.ped and .map). | + | </ |
- | in_file='/ | + | For the input ped and map files, you only need specify the path to the files and give the prefix used to name both files as is the norm, and plink will automatically fill the extension (.ped and .map). |
- | #directory to store output files, first create this directory(plink_out_files) in your home working directory in order to reference it here | + | < |
+ | in_file='/ | ||
out='/ | out='/ | ||
Line 138: | Line 139: | ||
module load plink/1.9 | module load plink/1.9 | ||
+ | </ | ||
- | ########## | ||
- | #its recommended to compress the files in order to use them with plink, the ped and map files carry a lot information are quite big, hence we convert them to binary files within plink for faster computation' | + | It is recommended to compress the files in order to use them with plink, the ped and map files carry a lot information |
+ | |||
+ | < | ||
+ | ########## | ||
#the (--file) tells plink where the file is, it automatically appends the extension) | #the (--file) tells plink where the file is, it automatically appends the extension) | ||
Line 149: | Line 153: | ||
| | ||
+ | </ | ||
+ | Above creates three files in the specified output directory '' | ||
+ | *// | ||
+ | *// | ||
+ | | ||
+ | |||
+ | Now we use the created binary files, indicated to plink using '' | ||
+ | |||
+ | -Look a the individuals with missing data and SNPs not typed in all the individuals | ||
+ | |||
+ | < | ||
+ | |||
+ | |||
+ | |||
+ | ######### summary statistics ######## | ||
+ | |||
+ | # | ||
+ | |||
+ | plink --bfile ${out}/ | ||
+ | --out ${out}/ | ||
+ | | ||
</ | </ | ||
+ | This creates two files. | ||
+ | *// | ||
+ | *// | ||
+ | #The missing information found in the '' | ||
+ | < | ||
+ | FID IID MISS_PHENO | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | 10 | ||
+ | </ | ||
+ | |||
+ | The information in each header is as follows; | ||
+ | < | ||
+ | FID Family ID | ||
+ | IID Individual ID | ||
+ | MISS_PHENO | ||
+ | N_MISS | ||
+ | N_GENO | ||
+ | F_MISS | ||
+ | </ | ||
+ | |||
+ | The information found in the '' | ||
+ | < | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | The information in each column is as follows; | ||
+ | < | ||
+ | SNP SNP identifier | ||
+ | CHR Chromosome number | ||
+ | N_MISS | ||
+ | N_GENO | ||
+ | F_MISS | ||
+ | </ | ||
+ | |||
+ | We can generate a file with filters added for the rate missing data in individuals '' | ||
+ | |||
+ | The thresholds for these filters should be adjusted accordingly to the different data sets. | ||
+ | |||
+ | < | ||
+ | |||
+ | #### filter data ### | ||
+ | |||
+ | plink --file ${file} \ | ||
+ | | ||
+ | --maf 0.01\ #SNPs with less than 1% minor allele frequencies | ||
+ | | ||
+ | --out ${out}/ | ||
+ | | ||
+ | |||
+ | </ | ||
===== Data analysis workflow with R and adegenet ===== | ===== Data analysis workflow with R and adegenet ===== | ||
tutorials/population-diversity/snp-chips.1600356135.txt.gz · Last modified: 2020/09/17 15:22 by bngina