kmerfreq count K-mer (with size K) frequency from the input sequence data, typically sequencing reads data, and reference genome data is also applicable. The forward and reverse strand of a k-mer are taken as the same k-mer, and only the kmer strand with smaller bit-value is used to represent the kmer. It adopts a 16-bit integer with max value 65535 to store the frequency value of a unique K-mer, and any K-mer with frequency larger than 65535 will be recorded as 65535. The program store all kmer frequency values in a 4^K size array of 16-bit integer (2 bytes), using the k-mer bit-value as index, so the total memory usage is 2* 4^K bytes. For K-mer size 15, 16, 17, 18, 19, it will consume constant 2G, 8G 32G 128G 512G memory, respectively. kmerfreq works in a highly simple and parallel style, to achieve as fast speed as possible. The output files can be used as input file for programs GCE and correct_error_reads.
See which versions are available:
$ module avail kmerfreq
Load one version into your environment and run it:
$ module load kmerfreq/git-90fca00d $ kmerfreq
Note: Please use the -t
option to set the number of CPU threads launched by kmerfreq. This number should be the same number of CPUs requested in your SLURM batch job. Otherwise kmerfreq will attempt to use ten (10) by default, which may impact your job's stability.
Notes from the sysadmin during installation:
$ cd /tmp $ git clone https://github.com/fanagislab/kmerfreq.git $ cd kmerfreq $ rm kmerfreq $ make $ sudo mkdir -p /export/apps/kmerfreq/git-90fca00d/bin $ sudo cp kmerfreq /export/apps/kmerfreq/git-90fca00d/bin