To assemble a genome or transcriptome, computer programs typically use data consisting of single and paired reads. Single reads are simply the short sequenced fragments themselves; they can be joined up through overlapping regions into a continuous sequence known as a 'contig'.
Repetitive sequences, polymorphisms, missing data and mistakes eventually limit the length of the contigs that assemblers can build.
Knowing that paired reads were generated from the same piece of DNA can help link contigs into 'scaffolds', ordered assemblies of contigs with gaps in between. Paired-read data can also indicate the size of repetitive regions and how far apart contigs are.
| Choice of De Novo Assemblers |
Whole genome analysis (WGS) |
Genomic de novo assembly |
Transcriptome analysis (RNASeq) |
Transcriptome de novo assembly |