Convince me to start using R Markdown

mfherman · October 4, 2017, 1:27pm

I'm a relatively new R user and most of my usage is data manipulation and statistical analysis for social science research. In general, my work consists of one-off analyses using different datasets, rather than ongoing projects where data and results need to be updated or reported on a regular basis. Outputs of the analyses often consists of CSVs that I share with coworkers, but also sometimes are research briefs that I write.

Thus far, I've only used R scripts for my code, organizing the project so that each script does a manageable and specific chunk of the project. Sometime the projects are somewhat involved and may lead to 15+ scripts for a single project. It seems like many people prefer R Markdown, but I haven't made the jump yet, in part because I'm not totally clear on how this would help my workflow. I can see that using R Markdown for a report template that needs to be frequently updated with new data would be very useful, but I'm having trouble seeing the best way to integrate R Markdown into my work.

I'd appreciate any examples of how and why using R Markdown has been helpful for you OR tips on how to structure projects using R Markdown that would be useful for my use case.

dlsweet · October 4, 2017, 1:33pm

Welcome to R!

One of the main reasons that I have found RMarkdown helpful for writing reports that don't need constant updating or reporting out is simply that I find it very easy to make consistent reports. For my position I often do a variety of data analyses but they all need to be presented in the same format for consistency. I've used RMarkdown to create a template for myself so I only need to change the actual code doing the analysis and the write-up of said analysis.

I haven't been using RMarkdown for very long however (< 6 mo.), but once I started using it, the usefulness and ease of not needing to switch programs for doing a write-up became very very apparent.

tomtec · October 4, 2017, 2:18pm

Hi! I've been using RMarkdown for over a year now. Before that, for any given project I would have code scripts plus README text files plus handwritten notes plus JPG/Postscript files with graphs etc. At the beginning of the project everything would be more or less organized but as time went on I inevitably started losing track of things (I'm not very good at keeping a tidy mental image of a project).

For me, RMarkdown has now become a core component of every project. It not only helps me maintain order, it also ensures reproducibility and consistency (as already noted by @dlsweet). I can keep my code, notes and relevant links all in one place, easy to maintain -it's a text file after all- and if for some reason you need to keep code files separate (I often do), you can always source them into the notebook.

Besides, I love its versatility - I use it for reports, notes, presentations, blog posts... the closest thing to a data science Swiss army knife that I know of!

dylanjm · October 4, 2017, 2:22pm

I love RMarkdown. I'm a senior in college and I use it for about 95% of my assignments. Any time I need to do data analysis, report writing, math homework, prototyping, etc.. I use Rmarkdown. Having the ability to knit to HTML or PDF and the markdown and LaTeX capabilities are really versatile and make working on any kind of deliverable so much easier. I also definitely stand out among my peers in the 'quality' of my work because I'm able to turn in a polished document as opposed to transferring everything to Word (Rmarkdown can knit to word too )

Yesterday, someone posted a really cool paper on Twitter from Airbnb talking about how most of their data analysis happens in .RMD files. It's a really interesting read! I suggest looking into it.

Link to tweet:

Link to paper:

mfherman · October 4, 2017, 2:46pm

Thanks for the replies, very helpful!

One projection organization question I have is, do you use a single R Markdown file for your whole project? For example, is all your data import, cleaning, merging, manipulation, etc. in the same file as your modeling and visualizations? If so, I'm thinkin that my R Markdown file could be 1000s of lines and maybe tricky to find specific parts of the project I'm interested in revisiting. In this case, would it be appropriate to develop separate R scripts for different each step of the project and then use the R Markdown document to pull everything together?

dlsweet · October 4, 2017, 2:55pm

I have a tendency to just throw all of my data preparation and analysis into one long script (the last I wrote was ~1300 lines of code, with white spaces). However, this is probably a very bad habit to fall into as it has made navigation for that less than ideal. If I weren't able to collapse code chunks in RStudio, I think I'd go crazy.

Anyways, I haven't used bookdown, which I believe builds on RMarkdown, but from my understanding it can compile multiple .Rmd files into one book (or report). That might be something worth looking into if you'd like to separate out all the different aspects of the reports you compile.

apreshill · October 4, 2017, 3:50pm

This question actually sparked me to create an account here just so I could answer it! I am a professor and researcher, and R Markdown has totally changed the way I work. So here is my pitch.

For research projects, I use R Markdown documents versus R scripts for different purposes. I will typically use R scripts to do things like importing the data, cleaning up variables, typecasting variables, doing any tidying, etc. I have separate scripts for each tasks, named:

01-import
02-clean-names
03-tidy, etc.

These scripts are short and focused, and named according to the specific thing they do so that I can trouble-shoot more easily when something goes wrong (if you use R Markdown for this, your file could not knit, and it can sometimes take awhile to figure out what went wrong if you have tons of lines of code all in one long file).

Sometimes these scripts include plots so I can refine my code when I am actively working on the script, but typically once I get the code how I want it, the plots are not useful so they don't tend to appear in these R scripts (I use the RStudio IDE during my interactive work sessions).

Then for my analyses and visualizations, I switch to R Markdown. In that file, I call my R scripts for processing/cleaning/tidying at the top in a chunk that looks like this:

{r load_scripts, include = FALSE}
source('./scripts/01-import.R')
source('./scripts/02-clean-names.R')
source('./scripts/03-tidy.R')

These scripts typically have some comments in the code using # this is the problem this next chunk of code addresses, but these scripts don't need any narrative to be useful- they just need to work so I can move on.

Next, I make R Markdown documents. Some are primarily visualizations and results of analyses where all code chunks are hidden using global chunk options at the top of the Rmd file (because my collaborators don't know R and will be confused when they see code) like this:

{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)

These docs typically use knitr::kable to create nicely formatted tables of output, and include lots of ggplot2 plots.

By only changing the above global chunk option to TRUE, I then have a complete printout of all my analyses and results, including the R code used to produce each analysis/plot, and the complete output. This is good for my collaborators that know R and can parse the code. The great thing is I don't have to create a different R Markdown files for each audience! Because I can annotate and include more narrative in the R Markdown files, I include explaining/teaching/discussion-provoking thoughts in those documents in between the R chunks.

For teaching statistics, I ask students to submit R Markdown files and a knitted version with echo = TRUE as a global option. This has made grading assignments so much easier, and the students can work in one document to analyze AND interpret data (rather than working in R console, and copying/pasting R code and output into a text editor or Word document, then adding narrative).

Ranae · October 4, 2017, 3:50pm

Take a look at the discussion Best Practices for Organizing RMarkdown Projects, specifically this description of not doing everything in RMarkdown.

I have been using RMarkdown a little over a year. At first, I was so excited! I am not what you call an "A type personality" and it looked like .rmd would really help me keep all my data, code, and analysis together - another tool in my fight against my inherent chaos!

But the longer I worked on an analysis, the more my R notebook would become a complete and ugly monster. Scrolling, scrolling, scrolling. . .code, figures, text, tibbles, figures, code. . . I dreaded opening these things up. I quit R Notebooks and went back to scripts with labeled code chunks and comments. Then some one asked to see what I'm working on and I copied and pasted everything into .rmd and by this time my GitHub repo looks as bad as my kitchen counter.

So, if people who are here to argue for the use RMarkdown could at the same time provide more direction for best practices as intended, including some examples to emulate, this would go a longs ways into getting me back on the proper .rmd track.

foundinblank · October 4, 2017, 3:52pm

I only started using RMarkdown (and github, github pages) about a month ago. I'm a experimental psychology postdoc who works remotely, and my lab and supervisor both are 8 time zones behind me. My job is mainly to analyze and test loads of eye tracking and behavioral psychology datasets, so I needed a way to share data reports with my lab.

RMarkdown and R Notebooks to the rescue!! I'm convinced it's the easiest way to communicate my work with my lab team. I write out data science narratives interspersed with code blocks (always hidden...I don't think anyone in my lab reads the code, but it's there!) and charts. I can quickly whip out participant demographic tables, show boxplots, print output from regression models, all in one document. My lab loves it, and when I update it, I always email my team with specific links (usually to H1 headings) and explain what changed.

I break down my analysis in various notebooks, though, and prefix them with 01, 02, 03, and so on. And I have a table of contents in my readme.md/index.html file. 01 is usually for data cleanup and generating a stable *.feather dataset, then 02-etc are for specific types of analysis using the dataset. In one case, I have 01 for data cleanup, 02 for behavioral data, 03 for eye tracking data, and 04 for "writing it all up" as if it was going to be a results section in a manuscript.

I'm happy to show you privately - can't post my work publicly here.

I like dlsweet's idea about combining .Rmd files in one book or report. I'll check that out soon.

AJF · October 4, 2017, 3:55pm

I recently had a similar issue -- each part of the project was its own R Markdown file (since we liked the ability to turn it into reports, and we found commenting more intuitive)

When we wanted to run it together, we found a sort of hacky way to do it, not sure if it made sense - but essentially, we created a master markdown document, and ran each subdocument in turn - by using knitr:knit("subfile1.Rmd", output = tempfile(), quiet = TRUE). The big problem is then we couldn't run the entire document at once -- since each document had chunk called "setup"...however, it worked for our needs.

We actually did it a bit more complicatedly than I described above (by wrapping each one in a function, and then calling the function, so as to ensure that we only got back one item from each markdown document, so there wouldn't be extra objects floating around our environment), but the above way could work.

The other thing we thought of doing is running source(knitr::purl("subfile1.Rmd", output = tempfile, quiet = TRUE)), but I got scared off by http://r.789695.n4.nabble.com/R-CMD-check-for-the-R-code-from-vignettes-td4691457.html

dlsweet · October 4, 2017, 4:06pm

@apreshill Thanks for the great answer and making an account just to share it!!! This seems like a great way to go about keeping a clean workflow and an easily organized RMarkdown project.

Darren_Dahly · October 4, 2017, 4:17pm

Start using R Markdown to generate reports of your data analyses.
If the data changes, rerun the report with a click of the mouse.
Take 3 days off of work.
On the 4th day, tell your collaborators that the re-analysis is complete.
Be hailed as a hero.

zkamvar · October 4, 2017, 4:33pm

For organization, I like to use a Makefile (here's an example: https://github.com/everhartlab/sclerotinia-366/#readme). This way I can split the project up into meaningful chunks that aren't necessarily linear. Plus, I can direct the output to a separate folder and make a browse-able website.

Here's a minimal make file that would render a small website:

.PHONY: all

all: docs/index.html docs/00-cleaning.html docs/01-analysis.html docs/02-plots.html 

reports/%.html : analysis/%.Rmd
	-R --slave -e "rmarkdown::render(input = '$<', \
	               output_file = '$(@F)', \
	               output_dir = '$(@D)')"

jessemaegan · October 4, 2017, 5:38pm

@Ranae - it looks like you and @apreshill posted at about the same time - her explanation helped clarify (for me) the "How should I organize things?" questions of RMarkdown.

dlsweet · October 4, 2017, 6:01pm

@mfherman Since you said you were a newer R user, have you looked into the book R for Data Science? It's a great resource for getting started into R and really focuses on the tidy model (it is written by Hadley Wickham after all) and the last section of the book is all about communicating results and has chapters on RMarkdown, everything you can do with it, and how to incorporate analysis into it seamlessly.

Ranae · October 4, 2017, 6:13pm

I like it and I'm working more towards this, but at the same time I feel like in doing so I am rejecting the original design and purpose of R Notebooks (at least as described in R4DS). However, as all my physical lab notebooks have also been failures, it is not surprising I can't maintain a digital one.

Hoyt · October 4, 2017, 6:16pm

From a private sector corporate perspective, I've found RMarkdown (specifically knit to HTML) to be an incredibly powerful communication tool for analysis delivered to managers, stakeholders and CxO positions. The Bootstrap framework (for HTML specifically) allows the report to be opened via email, even on a mobile device (with responsive design on mobile). This is something very valuable to a CxO on the go who works primarily on their phones. It also allows for a low barrier to entry sharing of the reports amongst departments or other analysts (in contrast to Tableau, Power BI, Power Point). And finally, given the HTML markdown can be opened right in your desktop browser, it allows you to keep the report in a very convenient place (a tab in your browser) that cuts down on 'Alt+Tab' or having to open another application to render.

I think the convenience of the html markdown file format is something not praised as much. I've found it to be the most powerful persuasive detail that has allowed me to continue to use RMarkdown for my work.

jennybryan · October 4, 2017, 6:26pm

Something I find important that hasn't come up yet: I like to render R Markdown (and specially-crafted R scripts) so I can revisit an analysis later w/o actually redoing the analysis.

Example: the gapminder data package was created from 3 messy Excel spreadsheets from the Gapminder website. Of course I saved the R scripts, but I also saved rendered versions, so I see what that process looked like the last time I did it (in 2015, apparently). Click on any .md file here:

You can learn about my data cleaning there without having to download the spreadsheets yourself, install the packages I chose to use, and run all my scripts.

In fact, that README itself was constructed as an .Rmd + a lot of file name discipline! 2017 Jenny would do lots of things differently from ≤2015 Jenny , but let's just ignore that.

I think the concept of rmarkdown::render() is very powerful for a data analyst. It works for .Rmd and .R alike.

mfherman · October 4, 2017, 6:39pm

@dlsweet I’ve worked through nearly all of r4ds and recommend it to anyone who asks me how I learned R! The project organization aspect of R Markdown is what has been giving me the most trouble, so all of these answers (especially @apreshill’s!) have been very helpful. Looking forward to hearing about other R Markdown use cases and ways to organize scripts, etc.

raybuhr · October 5, 2017, 6:33am

Lots of good stuff so far, but I feel like it's a bit focused on generating reports and analysis where Rmarkdown is really much more than just that.

Rmd files let you mix code (not just R, but other code engines as well) and markdown together to form publication ready documents.

In more layman terms, Rmarkdown can help you:

write reports for work
publish scientific journal articles
write a book
make a blog
create documentation for your R package
build an interactive dashboard
document your analysis like a science lab notebook
build a wiki
create templates for homework assignments
create templates for technical interviews

All of these options are possible just by adding a little bit of configuration options at the top of the Rmd file (such as title, author, theme, output file format, etc.), using markdown syntax to format your text (such as bold, italics, bullet points, etc.), and inserting "code chunks" to run arbitrary bits of code (such as make a plot using ggplot2 in R, run a SQL query against a remote database just by referring to the connection, perform some text manipulation in Python, etc.). The Rmd file is just a way to section off arbitrary bits of code from different other formats/languages, and the tool pandoc and R packages rmarkdown and knitr parse the Rmd file and build it into the document you want (defined in the config section at the top).

Hopefully you can see how useful Rmarkdown can be. If all you are doing is transforming bits of information and storing the results somewhere else, you might not need Rmarkdown. But if you have a story to tell with the results and want a flexible tool to help you tell that story in the way you see fit for the situation, Rmarkdown is going to be a great asset.