Scratch storages
Scratch storage is a storage for temporary files for running job.
This storage should be used only during computations and should be freed immediately after your job ends.
The location of scratch directory is defined by a system variable SCRATCHDIR
.
Scratch types
We offer four types of scratch storage:
Local scratch on node
- PBS resource
scratch_local
- available on every node, located on a regular hard disc
- choose this type as a default if you have no reason to do otherwise
- integer type, submitted as
scratch_local=10gb
- located in
/scratch/USERNAME/job_JOBID
Fast SSD scratch
- PBS resource
scratch_ssd
- located on a small SSD disc
- ultra fast (compared to local scratch) but smaller in volume
- integer type, submitted as
scratch_ssd=1gb
- not available on all computational nodes!
- to check for availability on a particular node, go to https://metavo.metacentrum.cz/pbsmon2/props -> choose a node -> search for
scratch_ssd
in a grey table - recommended in jobs where the bottleneck is disc-related operations (applications that create/read a lot of files)
- located in
/scratch.ssd/USERNAME/job_JOBID
Shared scratch
- PBS resource
scratch_shared
- network volume, which is shared between all nodes in a given cluster
- read/write operation slower than on local scratch
- useful if you need to run more than one application that needs access to the same data
- integer type, submitted as
scratch_shared=10gb
- not available on all computational nodes!
- to check for availability on a particular node, go to https://metavo.metacentrum.cz/pbsmon2/props -> choose a node -> search for scratch_shared in a grey table
- mounted to directory
/scratch.shared/USERNAME/job_JOBID
Scratch in RAM
- PBS resource
scratch_shm
- the scratch directory is in the RAM
- fastest, but data on scratch do not survive the end/failure of the job
- use when you need ultra-fast scratch AND when you absolutely don’t care about data from failed/killed/ended jobs
- boolean type, submitted as
scratch_shm=true
- maximum size of scratch is defined by the mem (memory) parameter
- remember to choose memory large enough (to hold both data in scratch and the actual memory requirements for the job)
- mounted to directory
/dev/shm/scratch.shm/USERNAME/job_JOBID
Shared scratch on cluster bee.cerit-sc.cz
BeeGFS (Beyond Extensible Enterprise File System) is a parallel distributed filesystem designed specifically for the needs of high-performance computing (HPC). It is used in computing clusters, scientific simulations, machine learning, genomics, and everywhere large datasets and fast parallel access are essential.
At MetaCentrum, we’ve adopted BeeGFS to meet the increasing challenges of data-intensive computations on data used by multiple job and/or job arrays. BeeGFS is available as a temporary working directory via the scratch_shared
resource on cluster bee.cerit-sc.cz.
Main usecases:
- PBS resource
scratch_shared
together withcl_bee=True
,for exampleqsub -l walltime=1:0:0 -q default@pbs-m1.metacentrum.cz -l select=1:ncpus=1:mem=400mb:scratch_shared=400mb:cl_bee=True
, - jobs with large files or a huge number of small files; BeeGFS efficiently handles massive datasets,
- jobs with parallel I/O operation
- jobs spanning multiple compute nodes
- array jobs with intermediate results - BeeGFS is well-suited for workflows where subsequent computations can pick up intermediate results left in the scratch directory eliminating the need to copy data to permanent storage or run on the same machine as the previous step.
Read more on BeeGFS article on eInfra blog.
No default scratch
For a batch job, you must set the size and type of scratchdir! There is no default type of scratch.
Cleaning scratch
Directory SCRATCHDIR
is not writable, only it’s content is. Therefore, you cannot, e.g. do rm -rf $SCRATCHDIR
, but you can rm -rf $SCRATCHDIR/*
.
Users should always clear the content of the scratch directory after the job ends to free disc space. Otherwise, this directory will be automatically deleted after 14 days at most (earlier if there is lack of space on disks).
Examples
Submit batch job with 100 GB scratch on local disc:
qsub -l select=1:ncpus=1:mem=4gb:scratch_local=100gb
Submit the interactive job with 20 GB memory and scratch in RAM:
qsub -I -l select=1:ncpus=1:mem=20gb:scratch_shm=true
Submit batch job with 1 GB of scratch on SSD disc:
qsub -l select=1:ncpus=1:mem=4gb:scratch_ssd=1gb
System variables
SCRATCHDIR
location of the scratch directory
echo $SCRATCHDIR
SCRATCH_TYPE
type of scratch directory
echo $SCRATCH_TYPE
SCRATCH_VOLUME
size of the scratch directory
echo $SCRATCH_VOLUME
Last updated on