This is an old revision of the document!
Some of the most common biological sequence databases are available on the HPC for you to use with tools like BLAST. Below you can find the list of them, their location on the system, and the last time they were updated.
We endeavor to keep this list updated as the One True List™.
|Name||Version Number||Last Updated¹||Database Location||How to Use|
|NCBI nr/nt nucleotide collection||N/A||2018-11-24|| || Use
|NCBI nr/nt protein collection||N/A||2018-08-16|| || use
|UniProt's UniProtKB/Swiss-Prot (manually curated, most reliable)||N/A||?|| || use
|UniProt's UniProtKB/TrEMBL (automated curation)||N/A||?|| || use
|UniProt's UniRef100||N/A||?|| || use
To use these databases you generally need to set an environment variable pointing to the location of the database before running your program. For example, to use
$ export BLASTDB=/export/data/bio/ncbi/blast/db $ blastn -db nt -query file.seq -out blast.out
¹ Use the following to determine the date of a BLAST database:
/export/apps/blast/2.7.1+/bin/blastdbcmd -info -db nt | grep Date