einfra logoDocumentation
Grid computing

Getting started

Welcome to MetaCentrum

MetaCentrum provides free computing resources to Czech academic institutions through distributed compute clusters.

New users unfamiliar with grid computing environments often have questions. Please contact user support if you need help - we’re here for you, and your feedback helps us improve the documentation.

Before getting started you need an active MetaCentrum account (see Account guide).

Getting started: Logging in

Once you have an activated account connect to MetaCentrum using SSH with your username. Here we are using the tarkil.metacentrum.cz login server (frontend). You can use any frontend, but we recommend choosing one closest to your physical location to minimize network latency.

ssh your_username@tarkil.metacentrum.cz

For the full list of frontends, their locations, and detailed login instructions (including Windows/PuTTY), see the Log in guide.

Understanding the architecture

MetaCentrum distributed ecosystem consists of compute clusters (or “compute nodes”, where the HPC computing itself is done), storage servers (a space for user data where users’ home directories are) and frontends (nodes dedicated for logging in).

The operating system used within MetaCentrum is Debian Linux.

Each frontend has a native home directory on a storage server. After logging in, you can access any storage from a given frontend.

Frontends are shared by all users and are not intended for heavy computing. Use them only for data preparation, job management, or light compiling.

For detailed infrastructure information see Frontend & Storage guide.

Running jobs

Batch job: Non-interactive: submit script and it runs independently (primary choice for grid computing)

Interactive job: Reserve resources, then work interactively (useful for testing, compiling, running GUI apps)

Your first batch job

Example: create a file called batch_job_example.sh with the following content:

#!/bin/bash

# name of the job (not neccessary)
#PBS -N batch_job_example                           

# reserve 1 CPU, 1 GB RAM, 1 GB disc space
#PBS -l select=1:ncpus=1:mem=1gb:scratch_local=1gb  

# run the job max 1 hour
#PBS -l walltime=1:00:00                            

# name of output file
outfile=output.${PBS_JOBID}

# go to scratchdir 
cd ${SCRATCHDIR}

# create the output file
touch ${outfile}

# print out the basic info about the job 
echo -e "Hello world at `date` from user ${USER}!\n" >> ${outfile}
echo -e "$PBS_JOBID is running on node `hostname -f` in a scratch directory $SCRATCHDIR\n" >> ${outfile}

# copy the output file to the directory from where the
# job was submitted
cp ${outfile} ${PBS_O_WORKDIR}/

# apply a scratch automatic cleanup utility
# (though not needed in this case)
clean_scratch

Run the job as

qsub batch_job_example.sh  # submit job, returns job ID

After a short while these three files will be created in the directory from where the job was submitted:

  • output.<jobID> (output file your script has explicitly created)
  • batch_job_example.o<jobID> (default file with standard output —STDOUT)
  • batch_job_example.e<jobID> (default file with standard error — STDERR; empty unless job logged errors)

Your first interactive job

To run interactive job for 1 hour with 1 CPU, 400 MB RAM and 400 MB storage space:

$ qsub -I  -l select=1:ncpus=1:mem=400mb:scratch_local=400mb -l walltime=01:0:00

After a while, the job starts and you will be moved to one of compute nodes:

qsub: waiting for job 18359645.pbs-m1.metacentrum.cz to start
qsub: job 18359645.pbs-m1.metacentrum.cz ready

$ hostname
tarkil16.grid.cesnet.cz
$ echo $SCRATCHDIR
/scratch/username/job_18359645.pbs-m1
$ pwd
/storage/praha1/home/username

Notice that the home directory has changed as well.

Software modules

Software in MetaCentrum is installed in so-called modules. To be able to use a software, you have to load it first by module add.

module avail amber/       # list existing versions of "amber" module
module add amber          # load default version of amber
module list               # show loaded modules

Module search

You can use wildcards to search: module avail *python*. Add / to see versions within a module directory.

Job management

qstat -u username          # list your running and queuing jobs
qstat -x -u username       # list your running, queuing and finished jobs

Status codes: Q=queued, R=running, F=finished

qdel jobID       # delete a job

Output files (in submission directory):

  • jobname.o<jobID> – standard output
  • jobname.e<jobID> – errors (check here first if job fails)

Account maintenance

Renewal: Accounts expire February 2nd. You’ll be notified by email.

Security: Use a strong password and never share credentials. For password changes and complete security rules, see Account page and Terms and conditions.

Next steps

Now that you understand the basics, you can:

Troubleshooting basics

If your job fails:

  1. Check the error file (*.e<jobID>)
  2. Verify your input files exist and have correct permissions
  3. Check if software modules are loaded correctly
  4. Ensure you requested adequate resources (memory, walltime, scratch)

For more advanced troubleshooting, see the Advanced guide.

Acknowledgements

Publications created with MetaCentrum support must be submitted to the publications system and acknowledged.

See Terms and conditions for an up-to-date acknowlewdgement wording.

Last updated on

publicity banner

On this page

einfra banner