Storage Department services

The CESNET Storage Department provides various types of data services. It is available to all users with MetaCentrum login and password.

Storage Department data policies will be described to a certain level at this page. For more detailed information, users should however navigate the Storage Department documentation pages.

Data storage technology in the Data Storage Department has changed by May 2024

For a long time the data were stored on hierarchical storage machines ("HSM" for short) with a directory structure accessible from /storage/du-cesnet.
Due to technological innovation of operated system were the HSM storages disconnected and decommissioned. User data have been transferred to machines with Object storage technology.
Object storage is successor of HSM with slightly different set of commands, i.e. it does not work in the same way.

Object storage

S3 storage is available for all Metacentrum users. You can generate your credetials via Gatekeeper service. Where you will select your Metacentrum account and you should obtain your access_key and secret_key.

Simple storage - use when you need commonly store your data

You can use the S3 storage as simple storage to store your data. You can use your credentials to configure some of the supported S3 clients like s3cmd, s5cmd (large datasets) and rclone. The detailed tutorial for S3 client configuration can be found in the official Data Storage Department tutorials

Direct usage in the job file

You can add s5cmd and rclone commands directly into your job file.

Bucket creation

Do not forget that the bucket being used for staging MUST exist on the remote S3 data storage. If you plan to stage-out your data into a non-existing bucket the job will fail. You need to prepare the bucket for stage-out in advance. You can use the official Data Storage Department tutorials for particular S3 client.

s5cmd

To use s5cmd tool (preferred) you need to create a credentials file (copy the content below) in your home dir, e.g. /storage/brno2/home/<your-login-name>/.aws/credentials.

[profile-name]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXX
max_concurrent_requests = 200
max_queue_size = 20000
multipart_threshold = 128MB
multipart_chunksize = 32MB

Then you can continue to use s5cmd via commands described in Data Storage guide. Alternatively, you can directly add the following lines into your job file.

#define CREDDIR, where you stored your S3 credentials for, default is your home directory
S3CRED=/storage/brno2/home/<your-login-name>/.aws/credentials

#stage in command for s5cmd
s5cmd --credentials-file "${S3CRED}" --profile profile-name --endpoint-url=https://s3.cl4.du.cesnet.cz cp s3://my-bucket/h2o.com ${DATADIR}/

#stage out command for s5cmd
s5cmd --credentials-file "${S3CRED}" --profile profile-name --endpoint-url=https://s3.cl4.du.cesnet.cz cp ${DATADIR}/h2o.out s3://my-bucket/

rclone

Alternatively, you can use rclone tool, which is less handy for large data sets. In case of large data sets (tens of terabytes) please use s5cmd above. For rclone you need to create a credentials file (copy the content below) in your home dir, e.g. /storage/brno2/home/<your-login-name>/.config/rclone/rclone.conf.

[profile-name]
type = s3
provider = Ceph
access_key_id = XXXXXXXXXXXXXXXXXXX
secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXX
endpoint = s3.cl4.du.cesnet.cz
acl = private

Then you can continue to use rclone via commands described in Data Storage guide. Or you can directly add following lines into your job file.

#define CREDDIR, where you stored your S3 credentials for, default is your home directory
S3CRED=/storage/brno2/home/<your-login-name>/.config/rclone/rclone.conf

#stage in command for rclone
rclone sync --progress --fast-list --config ${S3CRED} profile-name:my-bucket/h2o.com  ${DATADIR}

#stage out command for rclone
rclone sync --progress --fast-list --config ${S3CRED} ${DATADIR}/h2o.out profile-name:my-bucket/