I have discovered (the existence of) drake
, and am contemplating conversion of my projects that are heaps of R scripts and Rmd documents, with some that have simple makefile
like
# build the whole project
all: merged.Rds report.pdf
# process the input data
merged.Rds: merge-data.R file1.csv file2.csv
Rscript merge-data.R
# compile the output file from Markdown
report.pdf: merged.Rds report.Rmd
Rscript -e "rmarkdown::render(‘report.Rmd’)"
# clean up -- delete unneeded files
clean:
rm -fv merged.Rdata
This would seem to correspond to a drake_plan
like this:
plan <- drake_plan(
merged_data = source("merge-data.R"),
report = rmarkdown::render(kintr_in("report.Rmd"))
}
However this syntax loses the dependency of merged_data
on file1.csv
and file2.csv
. In high brow terms, we call source()
for its side effects of creating files, declaring functions (and drake_example("main")
is arguably guilty of that too as one has to call functions.R
exclusively for that sort of a side effect).
I thought of a very klunky fix along the lines of
plan <- drake_plan(
merged_data = withAutoprint({
readLines( file_in( "file1.csv" ) )
readLines( file_in( "file2.csv" ) )
source("merge-data.R")
saveRDS( merged, file = file_out( "merged.Rds" ) )
}),
report = rmarkdown::render(knitr_in("report.Rmd"))
)
with the expectation that knitr_in()
will figure out that report.Rmd
loads up the merged file. But drake::make()
stumbles upon source()
... of all things... and cannot find the source for that.
I want to convert stuff with minimal effort, and without really doing anything inside the existing scripts, so that my collaborators could continue running them as is.
I reasonably expect that @krlmlr has a very good answer to this
P.S. Toy example:
library(here)
tb1 <- tibble( i=1, x=1 )
tb2 <- tibble( i=1, y=1 )
write.csv(tb1, file=here("file1.csv"), row.names = FALSE )
write.csv(tb2, file=here("file2.csv"), row.names = FALSE )
merged-data.R
reads:
### merge files
tb1 <- read.csv(here("file1.csv") )
tb2 <- read.csv(here("file2.csv") )
full_join( tb1, tb2, by="i") -> merged
saveRDS( merged, file = here("merged.Rds") )
And the report.Rmd
is
---
title: "Report"
author: "John Doe"
date: "`r Sys.Date`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(here)
```
Here are the results:
```{r print_merged}
readRDS(here("merged.Rds"))
```