Add Summary Analysis

Last updated on 2023-03-20 | Edit this page

Overview

Questions

  • How can we specify a rule that has many dynamic input files?

Objectives

  • Add a summary rule that requires Segmented images
  • Use expand function to simplify creating filenames

Add summary_report to config.yaml:

BASH

head Scripts/SummaryReport.R

OUTPUT

config <- yaml::read_yaml(file = "config.yaml")
filtered_images_path <- config$filter_multimedia
output_path <- config$summary_report

filtered_images <- read.csv(file = filtered_images_path)

dir.create(dirname(output_path), showWarnings = FALSE)
rmarkdown::render("Scripts/Summary.Rmd", output_file=basename(output_path), output_dir=dirname(output_path))

Edit config.yaml adding summary_report:

summary_report: summary/report.html

Change the all rule to require summary/report.html in Snakefile:

rule all:
    inputs: config["summary_report"]

Add a new function that gathers the inputs for the summary rule and a summary rule:

def get_seg_filenames(wildcards):
  filename = checkpoints.filter.get().output[0]
  df = pd.read_csv(filename)
  ark_ids = df["arkID"].tolist()
  return expand('Segmented/{arkID}_segmented.png', arkID=ark_ids)

rule summary:
  input:
     script="Scripts/SummaryReport.R",
     segmented=get_seg_filenames
  output: config["summary_report"]
  container: "docker://ghcr.io/rocker-org/tidyverse:4.2.2"
  shell: "Rscript {input.script}"

Run snakemake to create the summary/report.

BASH

snakemake -c1 --use-singularity --dry-run