True single-file notebooks?

KenWilliams · December 5, 2017, 6:47pm

I'm wondering whether the RStudio team (or others?) have any plans to evolve the R Notebook format to a single-document notebook, or whether the current format is seen as pretty much the desired end state.

In the notebook formats I've worked effectively with previously (e.g. org-mode and Jupyter), the notebook that you interact with also contains whatever output is necessary to export to HTML or PDF or other formats. This includes images, tables, raw console output, etc. It doesn't include the R session data that created it.

By contrast, R Notebooks seem to contain only the input data cells, and the output goes to an external HTML file. This means that if I close my notebook environment, I (sometimes?) lose all the output, and I have to re-run whatever R code created the output, which is sometimes onerous. It also means there doesn't seem to be a mechanism for rendering the HTML (or other) document using an external driver like a Makefile, unless I want to re-run all the code again. I find that setup a lot more awkward than just storing the output in the same document, which I can stick in Git.

I know there's an output caching mechanism available, but it doesn't tend to match very well with my needs - usually I want to regenerate when the underlying data has changed, or some code in a package has changed, not when the (typically very short) code in the notebook has changed. It works much better for me to manually control re-running instead of letting a caching mechanism guess about it.

Is something like I've described in the offing? Or maybe a way to achieve it using a configuration I haven't discovered yet?

Thanks!

technocrat · December 5, 2018, 8:54am

I'm sure you've figured this out by now, but the key is creating an ordinary Rmd file like

---
title: "Demo"
author: "Richard Careaga"
date: "12/5/2018"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r cars}
summary(cars)
```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

When you run into more elaborate situations with long running chucks, save the output as Rda objects and adjust the code accordingly.

KenWilliams · December 9, 2018, 4:53pm

Thanks for the suggestion, but that doesn't create a document with the input & output all in one editable file, correct?

technocrat · December 9, 2018, 5:39pm

You're right. The intent is that only the code be editable plus any input that was manually entered. Any external code has to be re-imported and all code re-run to update the output version.

One of your options to save output that takes a long time to run is to save it within a code chunk to an Rda file in your working directory path, comment out the generating code, and replace it with load("BigOut.Rda") to make it go faster.

rmarkdown uses knittr for output, which in turn relies on a Haskell program pandoc that till convert your output (including the code if you flag the chunks echo = TRUE to almost any format you like, including LaTeX.

Another option is Shiny which has more of an interactive character and runs on an external server that does faster rendering.

jdlong · December 9, 2018, 5:47pm

Ken, can you share more about your use case?

I dance around the edges of some things that feel sort of like this. I so something more akin to having a scheduled job check if any data has changed. If data changed, then fire off a build process which knits docs with new data. But I'm not sure if that's really germane to what you're trying to accomplish.

technocrat · December 9, 2018, 6:04pm

I'm understanding better your goals. I think, however that the only caching-like mechanism available is to save large output, make minor changes to the ancillary code chunks or inline code and rerun. If you echo everything you do get all the input and output in a single document, but then you have to put in in a new Rmd document all as pure text and insert chunks and inlines and re-render. That would quickly become a nightmare for me, even with git.