einfra logoDocumentation

Kraken

module avail kraken/
module avail kraken2/

Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm.

Database

There are public Kraken2 databases (originally from https://benlangmead.github.io/aws-indexes/) available in a shared location. In order to use one of them, run

kraken2 --db /storage/projects-du-praha/Bio_databases/kraken2/CHOSEN_DATABASE
# includes the NCBI Taxonomy database

and request enough memory to contain the entire database (e.g. at least mem=900gb for kraken2_nt_20240530 database, please use du -h on database directory to get its size), unless only a short query is processed with the --memory-mapping option.

For fast access to the largest databases (e.g. kraken2_nt_20240530, k2_gtdb_genome_reps_20250609, k2_core_nt_20250609), we recommend adding qsub requirement cluster=turin. This selects machines with the fastest network connection to the database storage.

Last updated on

publicity banner

On this page

einfra banner