Biological Sequence Databases on the HPC

Some of the most common biological sequence databases are available on the HPC for you to use with tools like BLAST. Below you can find the list of them, their location on the system, and the last time they were updated.

We endeavor to keep this list updated as the One True List™.

Name Comments Updated¹ Database Location
nr/nt NCBI nucleotide collection 2019-06-20 /export/data/bio/ncbi/blast/db/v5
nr/nt NCBI protein collection 2019-06-20 /export/data/bio/ncbi/blast/db/v5
UniProt's UniProtKB/Swiss-Prot Manually curated, most reliable 2019-07-01 /export/data/bio/uniprot/blast/db
UniProt's UniProtKB/TrEMBL Automated curation ? /export/data/bio/uniprot/blast/db
UniProt's UniRef100 ? /export/data/bio/uniprot/blast/db

Using These Databases

Tools like BLAST use the BLASTDB environment variable to find the location of the system's BLAST databases. ILRI's BLAST environment modules like blast/2.10.0+ automatically set this variable when you load the module.

If you are using different software you will need to set the variable manually, for example:

$ export BLASTDB=$BLASTDB:/export/data/bio/ncbi/blast/db/v5
$ blastn -db nt -query file.seq -out blast.out


¹ Use the following to determine the date of a BLAST database:

$ module load blast/2.10.0+
$ blastdbcmd -info -db nt | grep Date
