Differences

This shows you the differences between two versions of the page.

--- population-diversity:microbiome-analysis-with-qiime2-using-illumina-paired-end-sequence-data [2020/04/29 13:17] – removed bngina
+++ population-diversity:microbiome-analysis-with-qiime2-using-illumina-paired-end-sequence-data [2020/04/29 15:09] (current) – [Qiime2 data filtering and feature (OTU) table construction] bngina
@@ Line 1: / Line 1: @@
+====== Analysis of Microbiome data ======
+Micriobial analysis aims to understand microbes and their functions in their environments. [[https://docs.qiime2.org/2020.2/tutorials/overview/| QIIME2]] (**Q**uantitative **I**nsights **I**nto **M**icrobial **E**cology) is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.
+We outline a pipeline to analyse 16S/18S paired-end sequencing data. Qiime2 can be used to analyse single-end and paired-end data.
+==== Importing data into Qiime2 ====
+[[https://docs.qiime2.org/2020.2/concepts/|Data]] used and produced by qiime2 are stored as artifacts. Artifacts contain data and metadata and have a ''.qza'' extension. Qiime2 provides methods to view information (data and metadata) stored in artifacts by creating visualizations, that have a ''.qzv'' extension.
+We import raw fastq files into qiime2. Depending on the sequencing platform used to generate the data, you will have data that has been de-multiplexed into sample specif fastq files, or not. The MiSeq platform at [[ https://hub.africabiosciences.org/| BecA-ILRI Hub]] will de-multiplex the data and we get sample specific fastq files.
+Read more about [[ https://docs.qiime2.org/2020.2/tutorials/importing/|different data types and importing them into qiime2]] for anlaysis. We will be importing data described as type //Cassava 1.8 paired-end de-multiplexed fastq//.
+To import fastq files;
+<code>
+#i create a directory to store all my artifact i.e '.qza' and the related visualization i.e '.qzv' files.
+mkdir /home/mydir/qiime2_data/
+#import the fastq files
+qiime tools import \
+ --type 'SampleData[PairedEndSequencesWithQuality]' \
+ --input-path /home/mydir/qiime2_data/ \
+ --input-format CasavaOneEightSingleLanePerSampleDirFmt \
+ --output-path /home/mydir/qiime2_data/my_raw_data.qza
+</code>
+We mentioned that qiime2 provides method to summarize and view artifacts by storing them as visualizations, .qzv, files. The ''.qzv''  files can be opened on any browser using [[https://view.qiime2.org/ |qiime viewer]]. To create visualization for the imported data;
+<code>
+qiime demux summarize \
+ --i-data  /home/mydir/qiime2_data/my_raw_data.qza \
+ --o-visualization /home/mydir/qiime2_data/my_raw_data.qzv
+</code>
+==== Qiime2 data filtering and feature (OTU) table construction ====
+Qiime2 provides two pluggins/methods for filtering your sequences to the required quality and length.Sequence Variants are then selected from quality filtered data and the results are feature tables better known as OTU tables and the representative feature sequences. These methods are [[https://www.ncbi.nlm.nih.gov/pubmed/27214047| Dada2]] and [[ https://msystems.asm.org/content/2/2/e00191-16| Deblur]].
+Important to keep in mind is the size of the amplified region covered by the primers used and what is the expected fragment length after read joining? This will determine the trim and truncate length parameters for dada2 and deblur pipelines.
+=== Dada2 ===
+The Dada2 pipeline detects and corrects illumina amplicon sequence data and additionally filters any phiX reads identified in sequencing data and also filter out any chimeric sequences.
+To see usage and parameters that can be adjusted get the help for the pluggin i.e
+<code>
+qiime dada2 --help
+</code>
+The data we are using is paired-end, hence we will use the [[https://docs.qiime2.org/2020.2/plugins/available/dada2/|qiime dada2 denoise-paired]] method of the pluggin.
+A key parameter to be careful about is the //''--p-trunc-q''//, which is a Q-score value. This means that in filtering for base quality, while reading the sequence from left to right, as soon as it encounters a base with a Q-score lower that threshold, by default its set to 2, the read is truncated at that position.Be careful with this, best to leave it at the default.
+The other key parameters in quality control of the sequences are those used to trim the forward, ''--p-trim-left-f''; ''--p-trunc-len-f'' and reverse '' --p-trim-left-r''; ''--p-trunc-len-r'' reads. The //''--p-trim-left-[f/r]''// tell qiime how many bases to trim from the beginning of the sequence, while the //''--p-trunc-len[f|r]''// tell qiime at what position the sequences should be truncated at the end.
+To determine what values to pass for these two parameters;
+  *Review the Interactive Quality Plot tab in the //''demux.qzv''// file that was generated by ''qiime demux summarize'' after importing the data to trim of the poor quality bases.
+  *The expected fragment size after the reads are joined, particularly for the //''--p-trunc-len[f|r]''//.  Refer to the expected fragment length for the primers used to prepare the libraries for sequencing. For example if the expected fragment length for your sequences is 465bp, and lets say the insert size during the sequencing was 300bp. We would want to have a final length of at least 465by or more to have better chances of final sequences of full length for the alignment. Considering that more often than not the forward sequences from illumina sequences have better quality than the reverse sequences. We could use a //''--p-trunc-len-f''// of //''250''// and a //''--p-trunc-len-r''// of //''220''//. to have a final sequence length of //''470bp''//.
+The command would therefore be
+<code>
+qiime dada2 denoise-paired \
+ --i-demultiplexed-seqs /home/mydir/qiime2_data/my_raw_data.qza \
+ --p-trim-left-f 0 \
+ --p-trim-left-r 0 \
+ --p-trunc-len-f 250 \
+ --p-trunc-len-r 220\
+ --o-table /home/mydir/qiime2_data/dada2_470_table.qza \
+ --o-representative-sequences /home/mydir/qiime2_data/dada2_470_rep-seqs.qza \
+ --o-denoising-stats /home/mydir/qiime2_data/dada2_470_denoising-stats.qza \
+ --p-n-threads 4
+#summarise the features table to view it,
+qiime feature-table summarize \
+ --i-table /home/mydir/qiime2_data/dada2_470_table.qza \
+ --o-visualization /home/mydir/qiime2_data/dada2_470_table.qzv
+#view a summary of the rep sequences
+qiime feature-table tabulate-seqs \
+ --i-data  /home/mydir/qiime2_data/dada2_470_rep-seqs.qza \
+ --o-visualization  /home/mydir/qiime2_data/dada2_470_rep-seqs.qzv
+#and also view the denoising statistics
+qiime metadata tabulate \
+  --m-input-file /home/mydir/qiime2_data/dada2_470_denoising-stats.qza \
+  --o-visualization /home/mydir/qiime2_data/dada2_470_denoising-stats.qza
+</code>