Run at Scale

Last updated on 2023-03-20 | Edit this page

Estimated time: 14 minutes

Overview

Questions

  • How can I efficently scale up my workflow in a cluster?

Objectives

  • Add a memory requirement to a rule
  • View a generic sbatch script to run a workflow at scale
  • Run the workflow at scale

Running with Slurm


  • Ensure rules request appropriate resources
    • threads/cpus
    • memory
    • requires a gpu
  • Configure snakemake to submit slurm jobs
  • Run main snakemake job in a background job

See Snakemake threads/resources docs for details on how to request different resources such as threads, memory, and gpus.

Annotating memory requirements


Update the reduce rule to request a specific amount of memory.

rule reduce:
    input: "multimedia.csv"
    params: rows="11"
    output: "reduce/multimedia.csv"
    resources:
        mem_mb=200
    shell: "head -n {params.rows} {input} > {output}"

Review sbatch script


The run-workflow.sh script was copied into your SnakemakeWorkflow during the project setup step. Run the following command to view it:

cat run-workflow.sh

OUTPUT

#!/bin/bash
#SBATCH --account=PAS2136
#SBATCH --time=00:30:00
. Scripts/setup_env.sh
JOBS=10
snakemake --jobs $JOBS --use-singularity --profile slurm/

Run Background job and monitor progress


Run snakemake in the background scaling up

BASH

sbatch run-workflow.sh

OUTPUT

Submitted batch job 23985835

Monitor job


squeue -u $LOGNAME
tail -f slurm-<sbatch_job_number>.out

Notice new job logs


Where did my logs go?

BASH

ls logs/