Snakemake
Snakemake is not a module
Snakemake is not installed system-wide as a module. This page collects just a few best practices concerning the data workflow within the MetaCentrum storage - scratchdir - frontend context.
Snakemake is a workflow management system for creating reproducible and scalable data analyses. Workflows are described via a human readable, Python-based language.
Installation
To avoid problems with Python incompatibility, the recommended way to install Snakemake is in a separated Mamba/Conda environment:
(BOOKWORM)user_123@skirit:~$ module add mambaforge
(BOOKWORM)user_123@skirit:~$ mamba create -n snakemake -c conda-forge -c bioconda snakemake
(BOOKWORM)user_123@skirit:~$ mamba activate snakemake
(snakemake) (BOOKWORM)user_123@skirit:~$ Temporary files
If there are (large/many) temporary files generated within the Snakemake pipeline (e.g. when using segemehl sequence aligner), they should not be copied to/from Snakemake’s default working directory in user’s home. Instead, you should keep them in SCRATCHDIR and copy only the input and output files.
shadow rules usage
One possibility to do this is shadow rule.
For example, tell segemehl to store its temporary files in SCRATCHDIR:
rule segemehl:
input:
fastq="{sample}_trim_collapsed.fastq.gz",
sege_idx="hg19.segemehl.idx",
output:
sam="{sample}_sege.sam",
shadow: "minimal"
shell: "segemehl.x -S -D 2 -M 1 --briefcigar -t 4 -i {input.sege_idx} -d {FA} -q {input.fastq} > {output.sam}"together with
snakemake --shadow-prefix $SCRATCHDIR --snakefile Snakefile ...tmpdir variable usage
The Snakemake tmpdir resource automatically leads to TMPDIR (or TEMP/TMP) variable for shell commands, scripts, wrappers and notebooks. The tmpdir resource is automatically used by shell commands, scripts and wrappers to store temporary data. If this argument is not specified at all, Snakemake just sets tmpdir=$TMPDIR. The tmpdir resource can also be overwritten either in a rule, or system-wide.
Define tmpdir per rule:
rule segemehl:
input:
fastq="{sample}_trim_collapsed.fastq.gz",
sege_idx="hg19.segemehl.idx",
output:
sam="{sample}_sege.sam",
resources: tmpdir="${SCRATCHDIR}"
shell: "segemehl.x -S -D 2 -M 1 --briefcigar -t 4 -i {input.sege_idx} -d {FA} -q {input.fastq} > {output.sam}"Define tmpdir workflow-wide:
snakemake --resources tmpdir=${SCRATCHDIR} --snakefile Snakefile ...Last updated on
