Skip to content

Cuda

module avail cuda/

CUDA (Compute Unified Device Architecture) libraries are software libraries developed by NVIDIA for parallel computing on NVIDIA GPUs (Graphics Processing Units).

These libraries provide optimized functions and algorithms that leverage the parallel processing power of GPUs for various computational tasks.

License

To use Cuda, you will also need to accept the license for cuDNN library.

Usage

GPU clusters

Cluster Nodes GPUs per node Compute
capability
Mem
[MiB]
CuDNN
adan.grid.cesnet.cz adan[1-61].grid.cesnet.cz 2x Tesla T4 7.5 15 109 YES
bee.cerit-sc.cz bee[1-10].cerit-sc.cz 2x H100 NVL 9.0 95 830 YES
cha.natur.cuni.cz cha.natur.cuni.cz 8x GeForce RTX 2080 Ti 7.5 11 019 YES
fau.natur.cuni.cz fau[1-3].natur.cuni.cz 8x Quadro RTX 5000 7.5 16 125 YES
fer.natur.cuni.cz fer[1-3].natur.cuni.cz 8x RTX A4000 8.6 16 117 YES
galdor.metacentrum.cz galdor[1-20].metacentrum.cz 4x A40 8.6 45 634 YES
gita.cerit-sc.cz gita[1-7].cerit-sc.cz 2x GeForce RTX 2080 Ti 7.5 11 019 YES
glados.cerit-sc.cz glados1.cerit-sc.cz 1x TITAN V GPU 7.0 12 066 YES
glados.cerit-sc.cz glados[2-7].cerit-sc.cz 2x GeForce RTX 2080 7.5 7 982 YES
glados.cerit-sc.cz glados[10-13].cerit-sc.cz 2x 1080Ti GPU 6.1 11 178 YES
grimbold.metacentrum.cz grimbold.metacentrum.cz 2x Tesla P100 6.0 12 198 YES
konos.fav.zcu.cz konos[1-8].fav.zcu.cz 4x GeForce GTX 1080 Ti 6.1 11 178 YES
luna2022.fzu.cz luna[201-206].fzu.cz 1x A40 8.6 45 634 YES
zia.cerit-sc.cz zia[1-5].cerit-sc.cz 4x A100 8.0 40 536 YES

GPU jobs

  • GPU queues: gpu (24 hours max) and gpu_long (up to 336 hours), both with open access for all MetaCentrum members
  • GPU jobs on the konos cluster can be also run via the priority queue iti (queue for users from ITI - Institute of Theoretical Informatics, Univ. of West Bohemia)
  • zubat cluster is available for any job which will run 24 hours at most.
  • Users from CEITEC MU and NCBR can run jobs via privileged queues on the zubat cluster.
  • The current version of the cuda drivers (parameter cuda_version) can be verified interactively in the qsub command assembler.

Requesting GPUs

The key scheduling constraint is to prevent jobs from sharing GPUs. To ensure this always use the gpu=X flag in qsub and request one of the gpu queues (gpu, gpu_long, iti).

qsub -l select=1:ncpus=1:mem=10gb:ngpus=X -q gpu

where X means a number of GPU cards required. By default

resources_default.gpu=1

If a job requires more GPU cards than it asks (or is available), prolog does not run it.

To plan your job on clusters with certain compute capability, use qsub command like this:

qsub -q gpu -l select=1:ncpus=1:ngpus=X:gpu_cap=cuda35 <job batch file>

Using the PBS parameter gpu_mem is possible to specify the minimum amount of memory that the GPU card will have.

qsub -q gpu -l select=1:ncpus=1:ngpus=1:gpu_mem=10gb ...

Example

qsub -I -q gpu -l select=1:ncpus=1:ngpus=1:scratch_local=10gb:gpu_mem=10gb -l walltime=24:0:0

Interactive job requests 1 machine, 1 CPU and 1 GPU card for 24 hours.

FAQs

Q: How can I recognize which GPUs are reserved for me by planning system?

A: IDs of GPU cards are stored in CUDA_VISIBLE_DEVICES variable. These IDs are mapped to CUDA tools virtual IDs. Though if CUDA_VISIBLE_DEVICES contains value 2, 3 then CUDA tools will report IDs 0, 1.

Q: I want to use the NVIDIA CuDNN library, which GPU clusters do support it?

A: Those which have GPU with compute capability > 3.0, which means all clusters (see the table above)