User Tools

Site Tools


tutorials:population-diversity:snp-chips

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tutorials:population-diversity:snp-chips [2020/09/21 11:07] – [Data analysis workflow with Plink 1.9] bnginatutorials:population-diversity:snp-chips [2020/09/22 10:21] (current) – [Data analysis workflow with Plink 1.9] bngina
Line 165: Line 165:
  
 <code> <code>
 +
 +
 +
 +######### summary statistics ########
  
 #missingness #missingness
Line 174: Line 178:
 </code> </code>
  
-This created two files.+This creates two files.
  
   *//bin_caprin_60k.imiss// - for the individuals   *//bin_caprin_60k.imiss// - for the individuals
Line 181: Line 185:
  
  
 +#The missing information found in the ''bin_caprin_60k.imiss'' for the individuals looks like below;
 +<code>
 +FID                               IID MISS_PHENO   N_MISS   N_GENO   F_MISS
 +            WG6694108-DNA_A01_110kin          Y     1325    53347  0.02484
 +           WG6694108-DNA_A02_105kin1          Y     1346    53347  0.02523
 +             WG6694108-DNA_A03_55kin          Y     1313    53347  0.02461
 +             WG6694108-DNA_A04_50kin          Y     1360    53347  0.02549
 +            WG6694108-DNA_A05_104kin          Y     1350    53347  0.02531
 +             WG6694108-DNA_A06_82kin          Y     1412    53347  0.02647
 +            WG6694108-DNA_A07_75kin1          Y     1387    53347    0.026
 +           WG6694108-DNA_A08_110kin1          Y     1312    53347  0.02459
 +             WG6694108-DNA_A09_77kin          Y     1356    53347  0.02542
 +  10           WG6694108-DNA_A10_Zkin2          Y     1349    53347  0.02529
  
 +</code>
 +
 +The information in each header is as follows;
 +<code>
 +FID                Family ID
 +IID                Individual ID
 +MISS_PHENO         Missing phenotype? (Y/N)
 +N_MISS             Number of missing SNPs
 +N_GENO             Number of non-obligatory missing genotypes i.e total number of SNPs used
 +F_MISS             Proportion of missing SNPs (in percentage)
 +</code>
 +
 +The information found in the ''bin_caprin_60k.lmiss'' for the SNPs is as below;
 +<code>
 + CHR                           SNP   N_MISS   N_GENO   F_MISS
 +             snp1-scaffold1-2170        4      648 0.006173
 +        snp1-scaffold708-1421224        8      648  0.01235
 +          snp10-scaffold1-352655        2      648 0.003086
 +     snp1000-scaffold1026-533890        0      648        0
 +    snp10000-scaffold1356-652219        4      648 0.006173
 +    snp10001-scaffold1356-703514        9      648  0.01389
 +    snp10002-scaffold1356-766996       10      648  0.01543
 +    snp10003-scaffold1356-808120        5      648 0.007716
 +    snp10004-scaffold1356-853276        3      648  0.00463
 +    snp10005-scaffold1356-907019        2      648 0.003086
 +</code>   
 +
 +The information in each column is as follows;
 +<code>
 +SNP                SNP identifier
 +CHR                Chromosome number
 +N_MISS             Number of individuals missing this SNP
 +N_GENO             Number of non-obligatory missing genotypes i.e total number of genotypes in the population
 +F_MISS             Proportion of sample missing for this SNP (in percentage)
 +</code>
 +
 +We can generate a file with filters added for the rate missing data in individuals ''--mind'' and call rate for the SNPs ''--geno'' and also for the minor allele frequency //(MAF)// , with flag ''--maf''.
 +
 +The thresholds for these filters should be adjusted accordingly to the different data sets.
 +
 +<code>
 +
 +#### filter data ###
 +
 +plink --file ${file} \
 + --geno 0.05 \   #95% call rate of SNPs
 + --maf 0.01\     #SNPs with less than 1% minor allele frequencies
 + --mind 0.25 \   #individuals with more than 25% missing data
 + --out ${out}/bin_caprin_60k_fltrd \
 + --make-bed
 +
 +</code>
 ===== Data analysis workflow with R and adegenet ===== ===== Data analysis workflow with R and adegenet =====
  
tutorials/population-diversity/snp-chips.1600686446.txt.gz · Last modified: 2020/09/21 11:07 by bngina