User Tools

Site Tools


mkatari-bioinformatics-august-2013-bioinformatics-august-2013-mpileup

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
mkatari-bioinformatics-august-2013-bioinformatics-august-2013-mpileup [2015/04/02 12:04] – created mkatarimkatari-bioinformatics-august-2013-bioinformatics-august-2013-mpileup [2015/11/18 19:48] (current) mkatari
Line 1: Line 1:
 [[mkatari-bioinformatics-august-2013|Back to Manny's Bioinformatics Workshop Home]] [[mkatari-bioinformatics-august-2013|Back to Manny's Bioinformatics Workshop Home]]
  
 +====== Identifying SNPs using mpileup ======
  
 +A quick way to call SNPs is to use mpileup which is one of the tools provided by samtools. However if you want control over your results, consider the GATK pipeline found [[mkatari-bioinformatics-august-2013-gatknotes|Processing files and Running GATK|here]]. The steps to process the file are almost the same compared to the GATK pipeline, however mpileup allows you to provide the different groups (Read Groups) as separate bam files as well.Below is a recommended pipeline when working with mpileup.
 +
 +First create the bowtie index files for alignment.
 +<code>
 +module load bowtie2
 +module load samtools
 +
 +bowtie2-build PTC_Human.fasta PTC_Human
 +samtools faidx PTC_Human.fasta
 +</code>
 +
 +The following steps have to executes for all samples
 +  * align using Bowtie2
 +  * convert to bam
 +  * sort the bam
 +  * remove duplicates
 +  * create a bam index for dedup bam files
 +
 +<code>
 +bowtie2 -x PTC_Human -U Cohen.fastq -S Cohen.sam
 +samtools view -bS Cohen.sam > Cohen.bam
 +samtools sort Cohen.bam Cohen_sorted
 +samtools index Cohen_sorted.bam
 +</code>
 +
 +
 +For further curation
 +  * realign indels within each samples
 +
 +
 +Once they are all processed you can create read groups for all the samples and then merge then into one file. This is not necessary for mpileup but it is for GATK. So if you are planning to use both methods it is not a bad idea.
 +  * create read groups
 +  * merge all samples to one bam file
 +  * sort and index this merged bam file
 +
 +Running mpileup assuming no read groups and not merged
 +  * run mpileup
 +  * convert bcf to vcf
 +  * filter using vcfutils
 +  * index vcf file to visualize on IGV
 +
 +
 +
 +<code>
 +samtools mpileup -uf PTC_Human.fasta \
 +         Cohen.bam \
 +         Linder.bam \
 +         Rikhi.bam \
 +         Sherman.bam > PTC_human.bcf
 +</code>
 +
 +Run bcf tools to call the snps:
 +  * b means save output as bcf
 +  * v means output potential variant sites only
 +  * c means call snps
 +  * g call genotypes at variant sites.         
 +
 +<code>
 +bcftools view -bvcg PTC_human.bcf > PTC_human.raw.bcf &
 +</code>
mkatari-bioinformatics-august-2013-bioinformatics-august-2013-mpileup.1427976285.txt.gz · Last modified: 2015/04/02 12:04 by mkatari