Scratch storages

Scratch storage is a storage for temporary files for running job.

This storage should be used only during computations and should be freed immediately after your job ends.

The location of scratch directory is defined by a system variable SCRATCHDIR.

Scratch types

We offer four types of scratch storage:

Local scratch on node

PBS resource scratch_local
available on every node, located on a regular hard disc
choose this type as a default if you have no reason to do otherwise
integer type, submitted as scratch_local=10gb
located in /scratch/USERNAME/job_JOBID

Fast SSD scratch

PBS resource scratch_ssd
located on a small SSD disc
ultra fast (compared to local scratch) but smaller in volume
integer type, submitted as scratch_ssd=1gb
not available on all computational nodes!
to check for availability on a particular node, go to https://metavo.metacentrum.cz/pbsmon2/props -> choose a node -> search for scratch_ssd in a grey table
recommended in jobs where the bottleneck is disc-related operations (applications that create/read a lot of files)
located in /scratch.ssd/USERNAME/job_JOBID

Shared scratch

PBS resource scratch_shared
network volume, which is shared between all nodes in a given cluster
read/write operation slower than on local scratch
useful if you need to run more than one application that needs access to the same data
integer type, submitted as scratch_shared=10gb
not available on all computational nodes!
to check for availability on a particular node, go to https://metavo.metacentrum.cz/pbsmon2/props -> choose a node -> search for scratch_shared in a grey table
mounted to directory /scratch.shared/USERNAME/job_JOBID

Scratch in RAM

PBS resource scratch_shm
the scratch directory is in the RAM
fastest, but data on scratch do not survive the end/failure of the job
use when you need ultra-fast scratch AND when you absolutely don’t care about data from failed/killed/ended jobs
boolean type, submitted as scratch_shm=true
maximum size of scratch is defined by the mem (memory) parameter
remember to choose memory large enough (to hold both data in scratch and the actual memory requirements for the job)
mounted to directory /dev/shm/scratch.shm/USERNAME/job_JOBID

Shared scratch on cluster bee.cerit-sc.cz

BeeGFS (Beyond Extensible Enterprise File System) is a parallel distributed filesystem designed specifically for the needs of high-performance computing (HPC). It is used in computing clusters, scientific simulations, machine learning, genomics, and everywhere large datasets and fast parallel access are essential.

At MetaCentrum, we’ve adopted BeeGFS to meet the increasing challenges of data-intensive computations on data used by multiple job and/or job arrays. BeeGFS is available as a temporary working directory via the scratch_shared resource on cluster bee.cerit-sc.cz.

Main usecases:

PBS resource scratch_shared together with cl_bee=True,for example qsub -l walltime=1:0:0 -q default@pbs-m1.metacentrum.cz -l select=1:ncpus=1:mem=400mb:scratch_shared=400mb:cl_bee=True,
jobs with large files or a huge number of small files; BeeGFS efficiently handles massive datasets,
jobs with parallel I/O operation
jobs spanning multiple compute nodes
array jobs with intermediate results - BeeGFS is well-suited for workflows where subsequent computations can pick up intermediate results left in the scratch directory eliminating the need to copy data to permanent storage or run on the same machine as the previous step.

Read more on BeeGFS article on eInfra blog.

No default scratch

For a batch job, you must set the size and type of scratchdir! There is no default type of scratch.

Cleaning scratch

Directory SCRATCHDIR is not writable, only it’s content is. Therefore, you cannot, e.g. do rm -rf $SCRATCHDIR, but you can rm -rf $SCRATCHDIR/*.

Users should always clear the content of the scratch directory after the job ends to free disc space. Otherwise, this directory will be automatically deleted after 14 days at most (earlier if there is lack of space on disks).

Examples

Submit batch job with 100 GB scratch on local disc:

qsub -l select=1:ncpus=1:mem=4gb:scratch_local=100gb

Submit the interactive job with 20 GB memory and scratch in RAM:

qsub -I -l select=1:ncpus=1:mem=20gb:scratch_shm=true

Submit batch job with 1 GB of scratch on SSD disc:

qsub -l select=1:ncpus=1:mem=4gb:scratch_ssd=1gb

System variables

SCRATCHDIR
    location of the scratch directory
    echo $SCRATCHDIR
SCRATCH_TYPE
    type of scratch directory
    echo $SCRATCH_TYPE
SCRATCH_VOLUME
    size of the scratch directory
    echo $SCRATCH_VOLUME

Scratch storages

On this page