BLAST+, BLAST
BLAST (Basic Local Alignment Search Tool) library is a collection of software tools and algorithms developed by the National Center for Biotechnology Information (NCBI).
BLAST is widely used in bioinformatics for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA sequences.
BLAST+ refers to an enhanced version of the original BLAST.
Usage
Programs
Command line examples
Effectivity
Blast is not using effectively most of the reserved CPUs in jobs. Set export BATCH_SIZE=3000000
before running any blast command (e.g. blastn, blastp). It will run much faster.
Databases
We maintain a local copy of Blast databases in /storage/projects/BlastDB
directory. Databases are ready to use.
- For short/single query jobs, you can use the databases directly in storage and refer to them from the batch script by their full path, i.e.
/storage/projects/BlastDB/DB_NAME_PREFIX
. - If you run a longer job, multiple queries or multiple jobs with a particular DB, it is more efficient to copy the database to the scratch directory.
In both cases, refer to the database (-db
option) within your blastn
/blastp
/tblastx
job by its basename only ( e.g. nt
, nr
, wgs
, refseq_genomic
). For example -db /storage/projects/BlastDB/nt
.
All available databases are described on the NCBI web. We mirror all of them. If you need to update DBs or add some new ones, please contact the user support meta@cesnet.cz.
Warning
Network load optimization
If you need to run several BLAST jobs with the same database, we ask user to optimize the network load by copying the database only once and using it for all the jobs running on the same node.
This can be done by inserting following construction into the batch script:
Last updated on