seqwish implements a lossless conversion from pairwise alignments between sequences to a variation graph encoding the sequences and their alignments. As input we typically take all-versus-all alignments, but the exact structure of the alignment set may be defined in an application specific way. This algorithm uses a series of disk-backed sorts and passes over the alignment and sequence inputs to allow the graph to be constructed from very large inputs that are commonly encountered when working with large numbers of noisy input sequences. Memory usage during construction and traversal is limited by the use of sorted disk-backed arrays and succinct rank/select dictionaries to record a queryable version of the graph.
See versions of seqwish that are available:
$ module avail seqwish
Load one version into your environment and run it:
$ module load seqwish/0.6 $ seqwish
Notes from the sysadmin during installation.
$ cd /tmp $ git clone --recursive https://github.com/ekg/seqwish.git -b v0.6 $ cd seqwish # enable a newer compiler because CentOS's default GCC 4.8.x throws errors... $ sudo yum install devtoolset-7-libatomic-devel.x86_64 $ scl enable devtoolset-7 bash $ cmake3 -H. -Bbuild && cmake3 --build build -- -j 4 $ sudo mkdir -p /export/apps/seqwish/0.6/bin $ sudo cp bin/seqwish /export/apps/seqwish/0.6/bin
Needs a newer compiler than CentOS's default GCC 4.8.5 as well as libatomic.