Add an R Script
Last updated on 2023-03-20 | Edit this page
Overview
Questions
- How can I create a rule using an RStudio friendly R script?
- How can I control what files Snakemake creates?
Objectives
- Create a rule that uses an RStudio friendly R script.
- Run snakemake specifying a target output file
- Create an all rule to control the default output files
- Use a YAML config file to avoid filename duplication
RStudio friendly Rules
Next we want to add an R script that will filter an input CSV file for the species of our choosing. We want this R script to be easily runnable within RStudio. To accomplish this we want to avoid passing command line arguments to the R script as this is difficult to accommodate within RStudio.
See the hard coded paths in the R script
OUTPUT
input_path <- "reduce/multimedia.csv"
output_path <- "filter/multimedia.csv"
...
Run a filtering R script
Add a new rule to the bottom of Snakefile
:
rule filter:
input:
script="Scripts/FilterImagesHardCoded.R",
fishes="reduce/multimedia.csv"
output: "filter/multimedia.csv"
shell: "Rscript {input.script}"
Expected Output:
OUTPUT
Building DAG of jobs...
Nothing to be done (all requested files are present and up to date).
Complete log: .snakemake/log/2023-02-28T164348.500545.snakemake.log
Snakemake didn’t create the filter/multimedia.csv
.
Snakemake Target Files
When Snakemake starts up it first determines a set of target files to create. By default the target files are determined from the first rule unless filenames are passed along the command line to snakemake. After determining the target files Snakemake looks through the rules and creates job plan (called a DAG) to create the desired target.
Create a desired file
For our workflow the first rule is the reduce rule. To change the target file you can just pass it as an argument to snakemake. This is a nice way to build different files from your workflow when developing it.
Run snakemake specifying filter/multimedia.csv
as the
target.
Run snakemake specifying a non-existant target
OUTPUT
Building DAG of jobs...
MissingRuleException:
No rule to produce winninglotterynumbers.txt (if you use input functions make sure that they don't raise unexpected exceptions).
Add a default rule
At the top of Snakefile add a new rule
rule all:
input: "filter/multimedia.csv"
Test it out by removing filter/multimedia.csv and running snakemake with no target
rm filter/multimedia.csv
snakemake -c1
See how the R script reads from the config file:
Use a config file to avoid filename duplication
A better option to allow easy changes to the rows param is to store this value in a config file. Snakemake comes with support for parsing and using a YAML config file.
Create a new file named config.yaml
with the following
contents:
reduce_multimedia: reduce/multimedia.csv
filter_multimedia: filter/multimedia.csv
Then update your Snakefile
to adding the config file
location and using config.yaml
to lookup filenames:
configfile: "config.yaml"
rule all:
input: config["filter_multimedia"]
rule reduce:
input: "multimedia.csv"
params: rows="11"
output: config["reduce_multimedia"]
shell: "head -n {params.rows} {input} > {output}"
rule filter:
input:
script="Scripts/FilterImages.R",
fishes=config["reduce_multimedia"]
output: config["filter_multimedia"]
shell: "Rscript {input.script}"
Running the workflow again: