User Tools

Site Tools


kmerfreq-software

This is an old revision of the document!


kmerfreq

kmerfreq count K-mer (with size K) frequency from the input sequence data, typically sequencing reads data, and reference genome data is also applicable. The forward and reverse strand of a k-mer are taken as the same k-mer, and only the kmer strand with smaller bit-value is used to represent the kmer. It adopts a 16-bit integer with max value 65535 to store the frequency value of a unique K-mer, and any K-mer with frequency larger than 65535 will be recorded as 65535. The program store all kmer frequency values in a 4^K size array of 16-bit integer (2 bytes), using the k-mer bit-value as index, so the total memory usage is 2* 4^K bytes. For K-mer size 15, 16, 17, 18, 19, it will consume constant 2G, 8G 32G 128G 512G memory, respectively. kmerfreq works in a highly simple and parallel style, to achieve as fast speed as possible. The output files can be used as input file for programs GCE and correct_error_reads.

Information

Usage

See which versions are available:

$ module avail kmerfreq

Load one version into your environment and run it:

$ module load kmerfreq/git-90fca00d
$ kmerfreq

Installation

Notes from the sysadmin during installation:

$ cd /tmp
$ git clone https://github.com/fanagislab/kmerfreq.git
$ cd kmerfreq
$ rm kmerfreq
$ make
$ sudo mkdir -p /export/apps/kmerfreq/git-90fca00d/bin
$ sudo cp kmerfreq /export/apps/kmerfreq/git-90fca00d/bin
kmerfreq-software.1688973566.txt.gz · Last modified: 2023/07/10 07:19 by aorth