mkatari-bioinformatics-august-2013-gatknotes
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
mkatari-bioinformatics-august-2013-gatknotes [2014/07/02 15:19] – mkatari | mkatari-bioinformatics-august-2013-gatknotes [2016/08/10 20:47] – mkatari | ||
---|---|---|---|
Line 36: | Line 36: | ||
< | < | ||
- | bowtie2 -x PTC_Human -U Cohen.fastq -S Cohen.sam | + | bowtie2 -x PTC_Human -U Sample1.fastq -S Sample1.sam |
- | samtools view -bS Cohen.sam > Cohen.bam | + | samtools view -bS Sample1.sam > Sample1.bam |
+ | |||
+ | bowtie2 -x PTC_Human -U Sample2.fastq -S Sample2.sam | ||
+ | samtools view -bS Sample2.sam > Sample2.bam | ||
+ | |||
+ | bowtie2 -x PTC_Human -U Sample3.fastq -S Sample3.sam | ||
+ | samtools view -bS Sample3.sam > Sample3.bam | ||
+ | |||
+ | bowtie2 -x PTC_Human -U Sample4.fastq -S Sample4.sam | ||
+ | samtools view -bS Sample4.sam > Sample4.bam | ||
</ | </ | ||
- | The picard method to sort is preferred by GATK | + | The picard method to sort is preferred by GATK. In some cases PICARD uses the temp directory to do its sorting. You may run into an error that complains about running out of space. To avoid this problem simply create your own tmp directory and tell java that it should use it. See details [[https:// |
< | < | ||
- | java -jar / | + | module load picard/1.133 |
- | | + | |
- | | + | picard |
- | | + | |
- | | + | |
+ | picard SortSam | ||
+ | | ||
+ | |||
+ | picard SortSam INPUT=Sample3.bam OUTPUT=Sample3.sorted.bam \ | ||
+ | SORT_ORDER=coordinate | ||
+ | |||
+ | picard SortSam INPUT=Sample4.bam | ||
+ | SORT_ORDER=coordinate | ||
</ | </ | ||
Line 53: | Line 72: | ||
< | < | ||
+ | picard AddOrReplaceReadGroups \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
- | java -jar / | + | picard AddOrReplaceReadGroups \ |
- | | + | |
- | | + | |
- | RGLB=Cohen \ | + | RGLB=Sample2 |
| | ||
| | ||
- | RGSM=Cohen | + | RGSM=Sample2 |
+ | |||
+ | picard AddOrReplaceReadGroups \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | picard AddOrReplaceReadGroups \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
</ | </ | ||
This will remove any reads that map to the same exact place. It is helpful to get rid of artifacts. | This will remove any reads that map to the same exact place. It is helpful to get rid of artifacts. | ||
< | < | ||
- | java -jar / | + | |
- | | + | picard MarkDuplicates \ |
- | | + | |
- | | + | |
+ | | ||
| | ||
- | | + | |
+ | | ||
+ | |||
+ | |||
+ | picard MarkDuplicates \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | picard MarkDuplicates \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | picard MarkDuplicates \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
</ | </ | ||
Line 76: | Line 147: | ||
Index the files and realign them | Index the files and realign them | ||
< | < | ||
- | samtools index Cohen.dedup.bam | + | samtools index Sample1.dedup.bam |
+ | samtools index Sample2.dedup.bam | ||
+ | samtools index Sample3.dedup.bam | ||
+ | samtools index Sample4.dedup.bam | ||
# | # | ||
- | java -Xmx2g -jar /export/ | + | |
+ | module load gatk/3.3.0 | ||
+ | |||
+ | GenomeAnalysisTK \ | ||
-T RealignerTargetCreator \ | -T RealignerTargetCreator \ | ||
-R PTC_Human.fasta \ | -R PTC_Human.fasta \ | ||
- | | + | |
- | | + | |
+ | GenomeAnalysisTK \ | ||
+ | -T IndelRealigner \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample1.dedup.bam \ | ||
+ | | ||
+ | -o Sample1.dedup.realign.bam | ||
+ | |||
+ | GenomeAnalysisTK \ | ||
+ | -T RealignerTargetCreator \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample2.dedup.bam \ | ||
+ | -o Sample2forIndelRealigner.intervals | ||
+ | |||
+ | GenomeAnalysisTK \ | ||
+ | -T IndelRealigner \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample2.dedup.bam \ | ||
+ | | ||
+ | -o Sample2.dedup.realign.bam | ||
+ | |||
+ | |||
+ | GenomeAnalysisTK \ | ||
+ | -T RealignerTargetCreator \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample3.dedup.bam \ | ||
+ | -o Sample3forIndelRealigner.intervals | ||
- | | + | GenomeAnalysisTK \ |
+ | -T IndelRealigner \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample3.dedup.bam \ | ||
+ | -targetIntervals Sample3forIndelRealigner.intervals \ | ||
+ | -o Sample3.dedup.realign.bam | ||
+ | |||
+ | GenomeAnalysisTK | ||
+ | -T RealignerTargetCreator \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample4.dedup.bam \ | ||
+ | -o Sample4forIndelRealigner.intervals | ||
+ | |||
+ | |||
+ | GenomeAnalysisTK | ||
-T IndelRealigner \ | -T IndelRealigner \ | ||
-R PTC_Human.fasta \ | -R PTC_Human.fasta \ | ||
- | | + | |
- | -targetIntervals | + | |
- | | + | |
</ | </ | ||
Line 99: | Line 216: | ||
< | < | ||
- | java -jar / | + | picard CleanSam \ |
- | | + | |
- | | + | |
</ | </ | ||
Line 107: | Line 224: | ||
< | < | ||
- | java -jar / | + | picard MergeSamFiles |
- | | + | |
- | | + | |
- | | + | |
+ | | ||
+ | | ||
- | samtools sort ShermanCohenMerged.bam ShermanCohenMerged.sorted | + | picard SortSam INPUT=AllMerged.bam OUTPUT=AllMerged.sorted.bam SORT_ORDER=coordinate |
+ | |||
+ | samtools index AllMerged.sorted.bam | ||
- | samtools index ShermanCohenMerged.sorted.bam | ||
</ | </ | ||
Line 121: | Line 241: | ||
< | < | ||
- | java -jar / | + | GenomeAnalysisTK -T UnifiedGenotyper \ |
- | -T UnifiedGenotyper \ | + | |
- | | + | |
-R PTC_Human.fasta \ | -R PTC_Human.fasta \ | ||
| | ||
Line 140: | Line 259: | ||
If you would like to generate a table of from the vcf file use the following command | If you would like to generate a table of from the vcf file use the following command | ||
< | < | ||
- | java -jar / | + | GenomeAnalysisTK \ |
-R PTC_Human.fasta | -R PTC_Human.fasta | ||
-T VariantsToTable \ | -T VariantsToTable \ | ||
Line 147: | Line 266: | ||
-GF GT -GF GQ \ | -GF GT -GF GQ \ | ||
-o PTC_human.gatk.vcf.table | -o PTC_human.gatk.vcf.table | ||
+ | </ | ||
+ | |||
+ | In order to filter your vcf file based on quality measures, depth, and also statistical significance, | ||
+ | |||
+ | < | ||
+ | GenomeAnalysisTK \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -T VariantFiltration \ | ||
+ | -o PTC_human.gatk.filter.vcf \ | ||
+ | --variant PTC_human.gatk.vcf \ | ||
+ | --filterExpression "QD < 2.0 || MQ < 40.0 || FS > 60.0 || HaplotypeScore > | ||
+ | --filterName " | ||
+ | |||
+ | </ | ||
+ | |||
+ | Good descriptions of the different information on vcf files [[https:// | ||
+ | |||
+ | Finally to save the SNPs that passed your filter, you simply use the selectvariant tool. | ||
+ | |||
+ | < | ||
+ | |||
+ | GenomeAnalysisTK \ | ||
+ | -T SelectVariants \ | ||
+ | --variant PTC_human.gatk.filter.vcf \ | ||
+ | -o PTC_human.gatk.filter.only.vcf \ | ||
+ | -ef \ | ||
+ | -R PTC_Human.fasta | ||
+ | |||
</ | </ |
mkatari-bioinformatics-august-2013-gatknotes.txt · Last modified: 2016/08/17 08:37 by mkatari