[[mkatari-bioinformatics-august-2013|Back to Manny's Bioinformatics Workshop Home]]

====== Creating a Bowtie Index and Performing an alignment using Cassava as a reference ======

This is a quick example of how to build a bowtie index and executing bowtie. Normally you will create the index only once you there is no need to create a special script for it. Just make sure you are in **interactive mode** mode and we can do everything on the command line.

Once you have found the sequence on the web you want to use as a reference you can use the linux command wget to download it quickly. And to make sure we keep our data organized, let's create a directory called **cassava** where we will store the file.

So first steps are to create the directory and download the file.

<code>
mkdir ~/cassava

cd cassava

wget ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Mesculenta/assembly/Mesculenta_147.fa.gz
</code>

Now we will uncompress the file using gunzip

<code>
gunzip Meculenta_147.fa.gz
</code>

To see how many scaffolds we in our file we can **grep** for the greater-than sign and then using the command **wc** to count lines.

<code>
grep ">" Mesculenta_147.fa | wc -l
</code>

In order to use Bowtie2 commands we have to first load the module

<code>
module load bowtie2
</code>

In order to create the index, which will be used like a database when we are aligning the reads to the reference, we use the command bowtie2-build. For cassava this should take about 10 minutes.

<code>
bowtie2-build Mesculenta_147.fa cassava
</code>

Based on the the type of sequences you have ( single end or pair end ) the options to run bowtie are slightly different. To run single end reads use -U followed by the file name. If it is pair-end then use -1 first file -2 second file. 
See the link for [[http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml|bowtie2]] for more detailed options and arguments that bowtie2 accepts.

<code>
bowtie2 -x cassava/cassava -U test.fastq -S test.sam
</code>