Skip to content

GPU computing

GPU job

To run GPU calculation, the user needs to:

  1. specify number of GPU cards (parameter ngpus), and
  2. choose one of the gpu queues explicitly.

name of GPU queue must be specified

Contrary to normal job, the GPU jobs will not be routed into appropriate queue according to parameter ngpus only. The name of the queue (parameter -q) has to be specified, too.

GPU queue name Walltime range 00:00:00 - 24:00:00 00:00:00 - 336:00:00 00:00:00 - 24:00:00

GPU jobs on the konos cluster can be also run via the priority queue (queue for users from ITI - Institute of Theoretical Informatics, Univ. of West Bohemia).


qsub -I -q gpu -l select=1:ncpus=1:ngpus=1:scratch_local=10gb -l walltime=24:0:0

Specific PBS resources

gpu mem

PBS parameter gpu_mem specifies minimum amount of memory that the GPU card will have.

qsub -q gpu -l select=1:ncpus=1:ngpus=1:gpu_mem=10gb ...


PBS parameter gpu_cap is Cuda compute capability as defined on this page.


PBS parameter cuda_version is version of CUDA installed.

Specific system variables

IDs of GPU cards are stored in CUDA_VISIBLE_DEVICES variable.

These IDs are mapped to CUDA tools virtual IDs. Though if CUDA_VISIBLE_DEVICES contains value 2, 3 then CUDA tools will report IDs 0, 1.

NVidia GPU cloud

NVIDIA provides GPU-tuned frameworks for deep learning packed as Docker containers under NVIDIA GPU CLOUD (NGC).

NGC containers are released monthly (RR.MM). You can find changelog and HW/driver support matrix at Support Matrix.

NGC images are available on Docker Hub and saved as Singularity images in CVMFS instance mounted on /cvmfs/ The Singularity images stored on cvmfs are faster to use than running container directly from Docker Hub, in which case you have to rebuild first to Singularity image format.

Deep learning frameworks documentation:

Run as Singularity image

To use local cvmfs instance, first automount the filesystem by command:

ls /cvmfs/

Currently there are NGC images of TensorFlow v1/v2 and PyTorch in more versions, which requires at least version 6.0 of compute capabilities and CUDA version 11. It means you have to use gpu_cap=cuda60 and cuda_version=11.0 in your PBS job.

In general, the Singularity is run as singularity run image.SIF or singularity shell image.SIF command plus other options.

  • run will launch container; in case of frameworks there is usually available Jyputer notebook or Jupyter Lab
  • shell will launch interactive shell
  • exec will run a particular command

From CVMFS as interactive job

Start interactive job:

qsub -I -l select=1:mem=16gb:scratch_local=10gb:ngpus=1:gpu_cap=cuda60:cuda_version=11.0 -q gpu -l walltime=4:00:00

Run singularity image with shell:

singularity shell --nv /cvmfs/\:20.09-py3.SIF

You will get a shell inside container and it is ready to run commands, e.g.:

Singularity> python -c  'import torch; print(torch.cuda.get_device_properties(0))'
_CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15109MB, multi_processor_count=40)

From CVMFS as regular batch job

Prepare e.g. script to run calculation within PyTorch image:

#PBS -N PyTorch_Job
#PBS -q gpu
#PBS -l select=1:ncpus=1:mem=16gb:scratch_local=10gb:ngpus=1:gpu_cap=cuda60
#PBS -l walltime=4:00:00
#PBS -m ae
singularity run --nv /cvmfs/\:20.09-py3.SIF /your/work_dir/

Submit the script in usual way:


From dockerhub as interactive job

It is also possible to run NGC images directly from Docker Hub using interactive job.

qsub -I -l select=1:mem=16gb:scratch_local=10gb:ngpus=1:gpu_cap=cuda60:cuda_version=11.0 -q gpu -l walltime=4:00:00

Within the interactive job, first create tmp directory within scratch directory and set SINGULARITY_TMPDIR to this director. Default /tmp has limited quota.

mkdir $SCRATCHDIR/tmp

The run singularity with Docker Hub URL:

singularity shell --nv docker://

Re-running image from Singularity cache

During the first run, layers of image will be downloaded into cache and the image will be built. From the second run onward, Singularity will restart the image from cache if possible.

By default the layers are cached to ~/.singularity/cache/ directory.

You can clean cache with singularity cache clean.


  • Directories /storage, /auto and home are binded inside container by default.
  • Add --nv argument to singularity for GPU computing.
  • Versions of NGC images requiring CUDA version higher than 11.0 are not supported (see and search for cuda_version)
  • Customizing images is possible. See Singularity
  • In case you would add NGC image or your own image to cmvfs, write ticket to

Get NGC API key

If you have not done so already, you need to register first at to get NGC API key. After you log in, you can find this API key under the Setup menu in you personal tab.

Build Singularity image

Building the image is a resource-intensive process and must be done as interactive job with large enough scratch (at least 10 GB). Some temporary directories are by default bound to /tmp, which has a limited user quota on MetaCentrum. Therefore you should bind them to scratch directory instead.

Example of script to build image from NGC:

export NGCDIR="/storage/brno2/home/melounova/ngc_sandbox" # directory where the image will go
export SINGULARITY_DOCKER_PASSWORD=Yj..........Az # API Key you get after logging in at
export SINGULARITY_CACHEDIR="/storage/brno2/home/melounova/.singularity" # the cache dir must exist
mkdir $SCRATCHDIR/tmp

singularity -v build $NGCDIR/TensorFlow.simg docker:// # build the image TensorFlow.simg

It is possible to create custom images derived from NGC images. See Singularity how to do it.

Job scripts for JupyterLab

We have prepared job scripts for JupyterLab inside container with TensorFlow and Pytorch. See scripts in /cvmfs/ # run JupyterLab with TensorFlow v.1 # run JupyterLab with TensorFlow v.2

In the header of scripts, you can change PBS parameters. Run the selected script in PBS, e.g.:


After the job starts you will get email with URL and password where JupyterLab is running.

GPU clusters

Cluster Node(s) GPU card specification gpu_mem CUDA
gpu_cap cuda_version
adan adan[1-61] 2x Tesla T4 15 109 7.5 cuda35,cuda61,
black 4x Tesla P100 16 280 6.0 cuda35, cuda60 11.2
galdor galdor[1-20] 4x A40 45 634 8.6 cuda35,cuda61,
glados 1x TITAN V GPU 12 066 7.0 cuda35,cuda61,
glados glados[2-7] 2x GeForce RTX 2080 7 982 7.5 cuda35,cuda61,
glados glados[11-13] 2x 1080Ti GPU 11 178 6.1 cuda35,cuda61 11.2
luna luna[201-206] 1x A40 45 634 8.6 cuda35,cuda61,
fer fer[1-3] 8x RTX A4000 16 117 8.6 cuda35,cuda61,
zefron 1x A10 22 731 8.6 cuda35,cuda61,
zefron 1x GeForce GTX 1070 8 119 3.5 cuda35, cuda61 11.2
zefron 1x Tesla K40c 11 441 3.5 cuda35 11.2
zia zia[1-5] 4x A100 40 536 8.0 cuda35,cuda61,
fau fau[1-3] 8x Quadro RTX 5000 16 125 7.5 cuda35,cuda61,
cha 8x GeForce RTX 2080 Ti 11 019 7.5 cuda35,cuda61,
gita gita[1-7] 2x GeForce RTX 2080 Ti 11 019 7.5 cuda35,cuda61,
konos konos[1-8] 4x GeForce GTX 1080 Ti 11 178 6.1 cuda35,cuda61 11.2
grimbold 2x Tesla P100 12 198 6.0 cuda35, cuda60 11.2