User Tools

Site Tools


mkatari-bioinformatics-august-2013-gatknotes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revisionBoth sides next revision
mkatari-bioinformatics-august-2013-gatknotes [2014/06/11 12:49] – created mkatarimkatari-bioinformatics-august-2013-gatknotes [2014/07/09 07:18] mkatari
Line 15: Line 15:
 bowtie2-build PTC_Human.fasta PTC_Human bowtie2-build PTC_Human.fasta PTC_Human
 samtools faidx PTC_Human.fasta samtools faidx PTC_Human.fasta
-java -jar /export/apps/picard-tools/1.112/CreateSequenceDictionary.jar R=PTC_Human.fasta O=PTC_Human.dict+java -jar /export/apps/picard-tools/1.112/CreateSequenceDictionary.jar 
 +   R=PTC_Human.fasta 
 +   O=PTC_Human.dict
  
 </code> </code>
Line 27: Line 29:
   * create a bam index for dedup bam files   * create a bam index for dedup bam files
   * realign samples   * realign samples
 +
 +Once they are all processed
   * merge all samples to one bam file   * merge all samples to one bam file
   * sort and index this merged bam file   * sort and index this merged bam file
Line 34: Line 38:
 bowtie2 -x PTC_Human -U Cohen.fastq -S Cohen.sam bowtie2 -x PTC_Human -U Cohen.fastq -S Cohen.sam
 samtools view -bS Cohen.sam > Cohen.bam samtools view -bS Cohen.sam > Cohen.bam
-bowtie2 -x PTC_Human -U Sherman.fastq -S Sherman.sam 
-samtools view -bS Sherman.sam > Sherman.bam 
 </code> </code>
  
-The picard method to sort is preferred by GATK+The picard method to sort is preferred by GATK. In some cases PICARD uses the temp directory to do its sorting. You may run into an error that complains about running out of space. To avoid this problem simply create your own tmp directory and tell java that it should use it. See details [[https://www.biostars.org/p/42613/|here]]. 
 <code> <code>
-java -jar /export/apps/picard-tools/1.112/SortSam.jar INPUT=Cohen.bam OUTPUT=Cohen.sorted.bam SORT_ORDER=coordinate +mkdir /var/scratch/mkatari 
-java -jar /export/apps/picard-tools/1.112/SortSam.jar INPUT=Sherman.bam OUTPUT=Sherman.sorted.bam SORT_ORDER=coordinate+mkdir /var/scratch/mkatari/tmp 
 + 
 +java -Djava.io.tmpdir=/var/scratch/mkatari/tmp -jar /export/apps/picard-tools/1.112/SortSam.jar 
 +   INPUT=Cohen.bam 
 +   OUTPUT=Cohen.sorted.bam 
 +   SORT_ORDER=coordinate 
 +    
 </code> </code>
  
Line 47: Line 57:
  
 <code> <code>
-java -jar /export/apps/picard-tools/1.112/AddOrReplaceReadGroups.jar INPUT=Sherman.sorted.bam OUTPUT=ShermanRG.bam RGLB=Sherman RGPL=IonTorrent RGPU=None RGSM=Sherman 
  
-java -jar /export/apps/picard-tools/1.112/AddOrReplaceReadGroups.jar INPUT=Cohen.sorted.bam OUTPUT=CohenRG.bam RGLB=Cohen RGPL=IonTorrent RGPU=None RGSM=Cohen+java -Djava.io.tmpdir=/var/scratch/mkatari/tmp -jar /export/apps/picard-tools/1.112/AddOrReplaceReadGroups.jar 
 +   INPUT=Cohen.sorted.bam 
 +   OUTPUT=CohenRG.bam 
 +   RGLB=Cohen 
 +   RGPL=IonTorrent 
 +   RGPU=None 
 +   RGSM=Cohen
 </code> </code>
  
 This will remove any reads that map to the same exact place. It is helpful to get rid of artifacts.  This will remove any reads that map to the same exact place. It is helpful to get rid of artifacts. 
 <code> <code>
-java -jar /export/apps/picard-tools/1.112/MarkDuplicates.jar INPUT=CohenRG.bam OUTPUT=Cohen.dedup.bam METRICS_FILE=Cohen.dedup.metrics REMOVE_DUPLICATES=TRUE ASSUME_SORTED=TRUE 
  
-java -jar /export/apps/picard-tools/1.112/MarkDuplicates.jar INPUT=ShermanRG.bam OUTPUT=Sherman.dedup.bam METRICS_FILE=Sherman.dedup.metrics REMOVE_DUPLICATES=TRUE ASSUME_SORTED=TRUE+java -Djava.io.tmpdir=/var/scratch/mkatari/tmp -jar /export/apps/picard-tools/1.112/MarkDuplicates.jar 
 +   INPUT=CohenRG.bam 
 +   OUTPUT=Cohen.dedup.bam 
 +   METRICS_FILE=Cohen.dedup.metrics 
 +   REMOVE_DUPLICATES=TRUE 
 +   ASSUME_SORTED=TRUE 
 +   MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 
 </code> </code>
  
Line 62: Line 83:
 <code> <code>
 samtools index Cohen.dedup.bam  samtools index Cohen.dedup.bam 
-samtools index Sherman.dedup.bam  
  
 #identifying indels #identifying indels
-java -Xmx2g -jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \+java -Xmx2g -Djava.io.tmpdir=/var/scratch/mkatari/tmp -jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \    -T RealignerTargetCreator \
    -R PTC_Human.fasta \    -R PTC_Human.fasta \
Line 71: Line 91:
    -o CohenforIndelRealigner.intervals    -o CohenforIndelRealigner.intervals
    
- #identifying indels 
-java -Xmx2g -jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \ 
-   -T RealignerTargetCreator \ 
-   -R PTC_Human.fasta \ 
-   -I Sherman.dedup.bam \ 
-   -o ShermanforIndelRealigner.intervals 
  
    
- java -Xmx4g -jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \+ java -Xmx4g -Djava.io.tmpdir=/var/scratch/mkatari/tmp -jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \
    -T IndelRealigner \    -T IndelRealigner \
    -R PTC_Human.fasta \    -R PTC_Human.fasta \
Line 86: Line 100:
    -o Cohen.dedup.realign.bam    -o Cohen.dedup.realign.bam
  
- java -Xmx4g -jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \ +</code>
-   -T IndelRealigner \ +
-   -R PTC_Human.fasta \ +
-   -I Sherman.dedup.bam \ +
-  -targetIntervals ShermanforIndelRealigner.intervals \ +
-   -o Sherman.dedup.realign.bam+
  
 +In some cases there may be a need to clean the sam/bam file(s) (soft-trimming the coordinates). To do this use CleanSam in Picard tools. You may want to just do it to all to avoid the error in a workflow, but it may not be necessary.
 +
 +<code>
 +java -Djava.io.tmpdir=/var/scratch/mkatari/tmp -jar /export/apps/picard-tools/1.112/CleanSam.jar \
 +   INPUT=Sherman.dedup.realign.bam \
 +   OUTPUT=Sherman.clean.dedup.realign.bam
 </code> </code>
  
-Now we merge the bam files and then sort and index them+Now we merge the bam files and then sort and index them. If you cleaned the bam file, remember to use the cleaned ones.
  
 <code> <code>
-java -jar /export/apps/picard-tools/1.112/MergeSamFiles.jar INPUT=Sherman.dedup.realign.bam INPUT=Cohen.dedup.realign.bam OUTPUT=ShermanCohenMerged.bam  +java -Djava.io.tmpdir=/var/scratch/mkatari/tmp -jar /export/apps/picard-tools/1.112/MergeSamFiles.jar 
 +   INPUT=Sherman.clean.dedup.realign.bam 
 +   INPUT=Cohen.dedup.realign.bam 
 +   OUTPUT=ShermanCohenMerged.bam   
 samtools sort ShermanCohenMerged.bam ShermanCohenMerged.sorted samtools sort ShermanCohenMerged.bam ShermanCohenMerged.sorted
 +
 samtools index ShermanCohenMerged.sorted.bam  samtools index ShermanCohenMerged.sorted.bam 
 </code> </code>
  
  
-Finall !! run gatk+Finally !! run gatk
  
 <code> <code>
-java -jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \+java -Djava.io.tmpdir=/var/scratch/mkatari/tmp -jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \
    -T UnifiedGenotyper \    -T UnifiedGenotyper \
    -I ShermanCohenMerged.sorted.bam \    -I ShermanCohenMerged.sorted.bam \
Line 115: Line 135:
    -glm SNP \    -glm SNP \
    -o PTC_human.gatk.vcf    -o PTC_human.gatk.vcf
 +
 +</code>
 +
 +If you want to load the vcf file into IGV, remember to index it first.
 +<code>
 +module load igvtools
 +igvtools index PTC_human.gatk.vcf
 +</code>
 +
 +If you would like to generate a table of from the vcf file use the following command
 +<code>
 +java --Djava.io.tmpdir=/var/scratch/mkatari/tmp jar /export/apps/GenomeAnalysisTK/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar \
 +     -R PTC_Human.fasta
 +     -T VariantsToTable \
 +     -V PTC_human.gatk.vcf \
 +     -F CHROM -F POS -F ID -F QUAL -F AC \
 +     -GF GT -GF GQ \
 +     -o PTC_human.gatk.vcf.table
 </code> </code>
mkatari-bioinformatics-august-2013-gatknotes.txt · Last modified: 2016/08/17 08:37 by mkatari