mkatari-bioinformatics-august-2013-gatknotes
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
mkatari-bioinformatics-august-2013-gatknotes [2014/06/11 13:36] – mkatari | mkatari-bioinformatics-august-2013-gatknotes [2016/08/17 08:37] (current) – mkatari | ||
---|---|---|---|
Line 12: | Line 12: | ||
module load bowtie2 | module load bowtie2 | ||
module load samtools | module load samtools | ||
+ | module load picard | ||
bowtie2-build PTC_Human.fasta PTC_Human | bowtie2-build PTC_Human.fasta PTC_Human | ||
samtools faidx PTC_Human.fasta | samtools faidx PTC_Human.fasta | ||
- | java -jar / | + | picard CreateSequenceDictionary \ |
| | ||
| | ||
Line 36: | Line 37: | ||
< | < | ||
- | bowtie2 -x PTC_Human -U Cohen.fastq -S Cohen.sam | + | bowtie2 -x PTC_Human -U Sample1.fastq -S Sample1.sam |
- | samtools view -bS Cohen.sam > Cohen.bam | + | samtools view -bS Sample1.sam > Sample1.bam |
- | bowtie2 -x PTC_Human -U Sherman.fastq -S Sherman.sam | + | |
- | samtools view -bS Sherman.sam > Sherman.bam | + | bowtie2 -x PTC_Human -U Sample2.fastq -S Sample2.sam |
+ | samtools view -bS Sample2.sam > Sample2.bam | ||
+ | |||
+ | bowtie2 -x PTC_Human -U Sample3.fastq -S Sample3.sam | ||
+ | samtools view -bS Sample3.sam > Sample3.bam | ||
+ | |||
+ | bowtie2 -x PTC_Human -U Sample4.fastq -S Sample4.sam | ||
+ | samtools view -bS Sample4.sam > Sample4.bam | ||
</ | </ | ||
- | The picard method to sort is preferred by GATK | + | The picard method to sort is preferred by GATK. In some cases PICARD uses the temp directory to do its sorting. You may run into an error that complains about running out of space. To avoid this problem simply create your own tmp directory and tell java that it should use it. See details [[https:// |
< | < | ||
- | java -jar / | + | module load picard/1.133 |
- | | + | |
- | OUTPUT=Cohen.sorted.bam \ | + | picard |
- | | + | |
- | | + | |
- | java -jar / | + | picard SortSam |
- | | + | SORT_ORDER=coordinate |
- | OUTPUT=Sherman.sorted.bam \ | + | |
- | | + | picard SortSam |
+ | | ||
+ | |||
+ | picard SortSam | ||
+ | SORT_ORDER=coordinate | ||
+ | |||
</ | </ | ||
Line 58: | Line 73: | ||
< | < | ||
- | java -jar / | + | picard AddOrReplaceReadGroups \ |
- | | + | |
- | | + | |
- | RGLB=Sherman | + | RGLB=Sample1 |
| | ||
| | ||
- | RGSM=Sherman | + | RGSM=Sample1 |
- | java -jar / | + | picard AddOrReplaceReadGroups \ |
- | | + | |
- | | + | |
- | RGLB=Cohen \ | + | RGLB=Sample2 |
| | ||
| | ||
- | RGSM=Cohen | + | RGSM=Sample2 |
+ | |||
+ | picard AddOrReplaceReadGroups \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | picard AddOrReplaceReadGroups \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
</ | </ | ||
This will remove any reads that map to the same exact place. It is helpful to get rid of artifacts. | This will remove any reads that map to the same exact place. It is helpful to get rid of artifacts. | ||
< | < | ||
- | java -jar / | + | |
- | | + | picard MarkDuplicates \ |
- | | + | |
- | | + | |
+ | | ||
| | ||
- | | + | |
+ | | ||
- | java -jar / | + | |
- | | + | picard MarkDuplicates \ |
- | | + | |
- | | + | |
+ | | ||
| | ||
- | | + | |
+ | | ||
+ | |||
+ | picard MarkDuplicates \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | picard MarkDuplicates \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
</ | </ | ||
Index the files and realign them | Index the files and realign them | ||
< | < | ||
- | samtools index Cohen.dedup.bam | + | samtools index Sample1.dedup.bam |
- | samtools index Sherman.dedup.bam | + | samtools index Sample2.dedup.bam |
+ | samtools index Sample3.dedup.bam | ||
+ | samtools index Sample4.dedup.bam | ||
# | # | ||
- | java -Xmx2g -jar /export/ | + | |
+ | module load gatk/3.3.0 | ||
+ | |||
+ | GenomeAnalysisTK \ | ||
-T RealignerTargetCreator \ | -T RealignerTargetCreator \ | ||
-R PTC_Human.fasta \ | -R PTC_Human.fasta \ | ||
- | | + | |
- | | + | |
- | # | + | |
- | java -Xmx2g -jar / | + | GenomeAnalysisTK \ |
+ | -T IndelRealigner \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample1.dedup.bam \ | ||
+ | -targetIntervals Sample1forIndelRealigner.intervals \ | ||
+ | -o Sample1.dedup.realign.bam | ||
+ | |||
+ | GenomeAnalysisTK | ||
-T RealignerTargetCreator \ | -T RealignerTargetCreator \ | ||
-R PTC_Human.fasta \ | -R PTC_Human.fasta \ | ||
- | | + | |
- | | + | |
+ | GenomeAnalysisTK \ | ||
+ | -T IndelRealigner \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample2.dedup.bam \ | ||
+ | | ||
+ | -o Sample2.dedup.realign.bam | ||
+ | |||
+ | |||
+ | GenomeAnalysisTK \ | ||
+ | -T RealignerTargetCreator \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample3.dedup.bam \ | ||
+ | -o Sample3forIndelRealigner.intervals | ||
- | java -Xmx4g -jar / | + | GenomeAnalysisTK \ |
-T IndelRealigner \ | -T IndelRealigner \ | ||
-R PTC_Human.fasta \ | -R PTC_Human.fasta \ | ||
- | | + | |
- | -targetIntervals | + | |
- | | + | |
- | | + | GenomeAnalysisTK \ |
+ | -T RealignerTargetCreator \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -I Sample4.dedup.bam \ | ||
+ | -o Sample4forIndelRealigner.intervals | ||
+ | |||
+ | |||
+ | GenomeAnalysisTK | ||
-T IndelRealigner \ | -T IndelRealigner \ | ||
-R PTC_Human.fasta \ | -R PTC_Human.fasta \ | ||
- | | + | |
- | -targetIntervals | + | |
- | | + | |
</ | </ | ||
- | Now we merge the bam files and then sort and index them | + | In some cases there may be a need to clean the sam/bam file(s) (soft-trimming the coordinates). To do this use CleanSam in Picard tools. You may want to just do it to all to avoid the error in a workflow, but it may not be necessary. |
< | < | ||
- | java -jar / | + | picard |
- | | + | |
- | | + | |
- | | + | </code> |
+ | |||
+ | Now we merge the bam files and then sort and index them. If you cleaned the bam file, remember to use the cleaned ones. | ||
+ | |||
+ | < | ||
+ | picard | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | picard SortSam INPUT=AllMerged.bam OUTPUT=AllMerged.sorted.bam SORT_ORDER=coordinate | ||
- | samtools | + | samtools |
- | samtools index ShermanCohenMerged.sorted.bam | ||
</ | </ | ||
- | Finall | + | Finally |
< | < | ||
- | java -jar / | + | GenomeAnalysisTK -T UnifiedGenotyper \ |
- | -T UnifiedGenotyper \ | + | |
- | | + | |
-R PTC_Human.fasta \ | -R PTC_Human.fasta \ | ||
| | ||
Line 153: | Line 249: | ||
-glm SNP \ | -glm SNP \ | ||
-o PTC_human.gatk.vcf | -o PTC_human.gatk.vcf | ||
+ | |||
+ | </ | ||
+ | |||
+ | If you want to load the vcf file into IGV, remember to index it first. | ||
+ | < | ||
+ | module load igvtools | ||
+ | igvtools index PTC_human.gatk.vcf | ||
+ | </ | ||
+ | |||
+ | If you would like to generate a table of from the vcf file use the following command | ||
+ | < | ||
+ | GenomeAnalysisTK \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -T VariantsToTable \ | ||
+ | -V PTC_human.gatk.vcf \ | ||
+ | -F CHROM -F POS -F ID -F QUAL -F AC \ | ||
+ | -GF GT -GF GQ \ | ||
+ | -o PTC_human.gatk.vcf.table | ||
+ | </ | ||
+ | |||
+ | In order to filter your vcf file based on quality measures, depth, and also statistical significance, | ||
+ | |||
+ | < | ||
+ | GenomeAnalysisTK \ | ||
+ | -R PTC_Human.fasta \ | ||
+ | -T VariantFiltration \ | ||
+ | -o PTC_human.gatk.filter.vcf \ | ||
+ | --variant PTC_human.gatk.vcf \ | ||
+ | --filterExpression " | ||
+ | --filterName mannyfilter | ||
+ | |||
+ | </ | ||
+ | |||
+ | Good descriptions of the different information on vcf files [[https:// | ||
+ | |||
+ | Finally to save the SNPs that passed your filter, you simply use the selectvariant tool. | ||
+ | |||
+ | < | ||
+ | |||
+ | GenomeAnalysisTK \ | ||
+ | -T SelectVariants \ | ||
+ | --variant PTC_human.gatk.filter.vcf \ | ||
+ | -o PTC_human.gatk.filter.only.vcf \ | ||
+ | -ef \ | ||
+ | -R PTC_Human.fasta | ||
+ | |||
</ | </ |
mkatari-bioinformatics-august-2013-gatknotes.1402493777.txt.gz · Last modified: 2014/06/11 13:36 by mkatari