Back to Manny's Bioinformatics Workshop Home

Creating a Bowtie Index and Performing an alignment using Cassava as a reference

This is a quick example of how to build a bowtie index and executing bowtie. Normally you will create the index only once you there is no need to create a special script for it. Just make sure you are in interactive mode mode and we can do everything on the command line.

Once you have found the sequence on the web you want to use as a reference you can use the linux command wget to download it quickly. And to make sure we keep our data organized, let's create a directory called cassava where we will store the file.

So first steps are to create the directory and download the file.

mkdir ~/cassava

cd cassava

wget ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Mesculenta/assembly/Mesculenta_147.fa.gz

Now we will uncompress the file using gunzip

gunzip Meculenta_147.fa.gz

To see how many scaffolds we in our file we can grep for the greater-than sign and then using the command wc to count lines.

grep ">" Mesculenta_147.fa | wc -l

In order to use Bowtie2 commands we have to first load the module

module load bowtie2

In order to create the index, which will be used like a database when we are aligning the reads to the reference, we use the command bowtie2-build. For cassava this should take about 10 minutes.

bowtie2-build Mesculenta_147.fa cassava

Based on the the type of sequences you have ( single end or pair end ) the options to run bowtie are slightly different. To run single end reads use -U followed by the file name. If it is pair-end then use -1 first file -2 second file. See the link for bowtie2 for more detailed options and arguments that bowtie2 accepts.

bowtie2 -x cassava/cassava -U test.fastq -S test.sam