Back to Manny's Bioinformatics Workshop Home

Identifying SNPs using mpileup

A quick way to call SNPs is to use mpileup which is one of the tools provided by samtools. However if you want control over your results, consider the GATK pipeline found Processing files and Running GATK|here. The steps to process the file are almost the same compared to the GATK pipeline, however mpileup allows you to provide the different groups (Read Groups) as separate bam files as well.Below is a recommended pipeline when working with mpileup.

First create the bowtie index files for alignment.

module load bowtie2
module load samtools

bowtie2-build PTC_Human.fasta PTC_Human
samtools faidx PTC_Human.fasta

The following steps have to executes for all samples

bowtie2 -x PTC_Human -U Cohen.fastq -S Cohen.sam
samtools view -bS Cohen.sam > Cohen.bam
samtools sort Cohen.bam Cohen_sorted
samtools index Cohen_sorted.bam

For further curation

Once they are all processed you can create read groups for all the samples and then merge then into one file. This is not necessary for mpileup but it is for GATK. So if you are planning to use both methods it is not a bad idea.

Running mpileup assuming no read groups and not merged

samtools mpileup -uf PTC_Human.fasta \
         Cohen.bam \
         Linder.bam \
         Rikhi.bam \
         Sherman.bam > PTC_human.bcf

Run bcf tools to call the snps:

bcftools view -bvcg PTC_human.bcf > PTC_human.raw.bcf &