Biological Sequence Databases on the HPC
Some of the most common biological sequence databases are available on the HPC for you to use with tools like BLAST. Below you can find the list of them, their location on the system, and the last time they were updated.
Name | Comments | Updated¹ | Database Location |
---|---|---|---|
nt | NCBI nucleotide collection (v5²) | Sep 30, 2023 | /export/data/bio/ncbi/blast/db/v5 |
nr | NCBI protein collection (v5²) | Sep 30, 2023 | /export/data/bio/ncbi/blast/db/v5 |
UniProt's UniProtKB/Swiss-Prot | Manually curated, most reliable | July 1, 2019 | /export/data/bio/uniprot/blast/db |
UniProt's UniProtKB/TrEMBL | Automated curation | ? | /export/data/bio/uniprot/blast/db |
UniProt's UniRef100 | ? | /export/data/bio/uniprot/blast/db |
Using These Databases
Tools like BLAST use the BLASTDB
environment variable to find the location of the system's BLAST databases. ILRI's BLAST environment modules like blast/2.10.0+
automatically set this variable when you load the module.
If you are using different software you will need to set the variable manually, for example:
$ export BLASTDB=$BLASTDB:/export/data/bio/ncbi/blast/db/v5 $ blastn -db nt -query file.seq -out blast.out
Notes
1. Use the following to determine the date of a BLAST database:
$ module load blast/2.14.1+ $ blastdbcmd -info -db nt | grep Date
2. In 2019 NCBI introduced BLAST database format version 5 and these only work with BLAST tools starting from 2.9.0. NCBI are no longer updating the version 4 databases, but we have preserved them in a separate directory if you are using tools that do not support version 5:
/export/data/bio/ncbi/blast/db/v4