automating table production

sgenter · August 11, 2020, 4:12pm

Hi everyone. I'm attempting to create a "topline" of tabulated frequency distributions for a longish survey. I've figured out how to use tabyl and flextable functions to create nicer looking tables, but have been copying and pasting the commands over and over again, replacing only the variable names and renaming the table objects themselves.

Is there a way to repeat this process with code rather than copying and pasting?

I'm new to R and have never posted here before. I've attempted to follow the guidelines outlined on the welcome page, but am open to feedback about how to ask better questions. I've also searched around quite a bit lookking for similar examples, but since it's all quite new for me I need step-by-step guidance and really appreciate the help.

# load libraries
library(tidyverse)
library(janitor)
library(flextable)
library(officer)

# create dataframe with nominal and ordinal data
gender <- c("Male", "Male", "Male", "Male", "Female", "Female", "Female", "Male", "Female")

opinion <- c("Strongly Agree", "Strongly Agree", "Somewhat Agree", "Somewhat Agree", "Neither", 
             "Neither", "Somewhat Disagree", "Strongly Disagree", "Strongly Disagree")

opinion <- factor(opinion, order = TRUE, levels = c('Strongly Agree','Somewhat Agree', 'Neither',
                                                    'Somewhat Disagree', 'Strongly Disagree'))

df <- data.frame(gender, opinion)

# create tables of tabulated frequencies
gender_tbl <- df %>% 
  tabyl(gender) %>% 
  adorn_totals() %>% 
  adorn_pct_formatting(digits = 0, rounding = "half up")

gender_tbl <- flextable(gender_tbl) %>% 
  set_header_labels(gender = "",
                    percent = "Percent")
                    
gender_tbl <- width(gender_tbl, width = 2.0)
gender_tbl <- align(gender_tbl, j = 1, align = "left", part = "all")
gender_tbl <- align(gender_tbl, j = 3, align = "right", part = "all")

gender_tbl


# so far, I've been copying and pasting for each of the variables in the survey
# as in below.
opinion_tbl <- df %>% 
  tabyl(opinion) %>% 
  adorn_totals() %>% 
  adorn_pct_formatting(digits = 0, rounding = "half up")

opinion_tbl <- flextable(opinion_tbl) %>% 
  set_header_labels(opinion = "",
                    percent = "Percent")

opinion_tbl <- width(opinion_tbl, width = 2.0)
opinion_tbl <- align(opinion_tbl, j = 1, align = "left", part = "all")
opinion_tbl <- align(opinion_tbl, j = 3, align = "right", part = "all")

opinion_tbl

technocrat · August 11, 2020, 8:47pm

Good reprex. My code doesn't render there, but shows use of a function to do this.

suppressPackageStartupMessages(
  {
  library(dplyr)
  library(janitor)
  library(flextable)
  library(officer)
  }
)

# create dataframe with nominal and ordinal data
gender <- c("Male", "Male", "Male", "Male", "Female", "Female", "Female", "Male", "Female")

opinion <- c("Strongly Agree", "Strongly Agree", 
             "Somewhat Agree", "Somewhat Agree", 
             "Neither", "Neither", "Somewhat Disagree",
             "Strongly Disagree", "Strongly Disagree")

opinion <- factor(opinion, order = TRUE, 
                  levels = c('Strongly Agree','Somewhat
                             Agree', 'Neither',
                            'Somewhat Disagree', 'Strongly
                            Disagree'))

df <- data.frame(gender, opinion)

mk_tab <- function(x, y) {
  df %>% 
    tabyl(opinion) %>% 
    adorn_totals() %>% 
    adorn_pct_formatting(digits = 0, rounding = "half up") %>%
    flextable(.) %>% 
    set_header_labels(opinion = "",
                      percent = "Percent") %>%
    width(., width = 2.0) %>%
    align(., j = 1, align = "left", part = "all") %>%
    align(., j = 3, align = "right", part = "all")
}

mk_tab(df,gender)

mk_tab(df,opinion)

sgenter · August 12, 2020, 12:43am

technocrat--I much appreciate your support.

On my end, running the mk_tab(df,opinion) and mk_tab(df,gender) lines at the end leads to two tables that are the same, and that don't preserve the formatting of my original example.

I have a guess for why both may be happening, but I'm not sure how to fix it.

The first is that the tabyl function is being applied to the "opinion" variable, as in:

mk_tab <- function(x, y) {
df %>%
tabyl(opinion) %>%
adorn_totals() %>%

In an ideal world, the function would be applied to all of the variables in my dataset (in this case "gender" and "opinion," but for my task I have closer to 100). I'm also hoping to apply the flextable formatting to all of the variables.

Re the formatting, when I working with the flextable function I could not get anything after this line to work without modifying the object, as in my first example. Using the pipping may have altered the formatting?
set_header_labels(opinion = "",
percent = "Percent")

Other than that your code seems like it would meet my basic goal, if only it could be applied to dozens of variables and preserve the formatting in my original example.

I very much appreciate your help.

technocrat · August 12, 2020, 1:43am

Unfortunately, reprex won't render in connection with flextable.

The function needs re-writing to replace tabyl(opinion) with tabyl(y), which is why the two tables are identical. I've been unable to fix that.

In troubleshooting, I looked at tabyl

## S3 method for class 'data.frame'
tabyl(dat, var1, var2, var3, show_na = TRUE, show_missing_levels = TRUE, ...)

tabyl renders 2- or 3-way contingency tables. To apply it to a data frame containing more than three variables requires a rule to choose which pairs of variables will be run. The simple case is that var1 is fixed and each of the remaining variables is taken in turn. The general case is a permutation of n items taken two at a time, about 5,000 for 100 variables.

Also missing from the problem specification is the desired organization of the png output.

sgenter · August 12, 2020, 1:13pm

Okay, I'll try to clarify, recognizing that you help many people here. I'm using rmarkdown to create a topline for a survey, presenting a one-way table for the frequency distribution for each question in the survey (so even though tabyl can handle two or three variables, I just need one). In the markdown document I have a question from the survey written in the plain text area, then a code chunk just below to create the table. If you were to take the code from my original post and paste it into two code chunks--one for gender and one for opinion--in a markdown document, it would render something like what I'm trying to create (of course in real life I would also write the wording of the question as it appeared in the survey above each table).

Thus far I've been copying and pasting all of the code to create and format the table for each variable in my dataset. My hope is to find a way to handle all of the table creation/formatting at the top of the markdown file and then place an object below for each table. I was optimistic about your original reply because it seemed to be going in that direction, except in your case it seemed like I would be placing a function throughout the document for each table. That would be fine too. My goal is to have something that is more replicable than the approach I'm taking now. That is, if someone comes back and asks me to change something (i.e., "we want you to drop NAs...we want you to make the tables narrower/wider?), I'm hoping to be able to change it in one place to reduce error.

I'm happy to give more detail if it would be helpful, and I continue to be grateful for your support.

technocrat · August 12, 2020, 6:56pm

Ok, that helps. tabyl may not be the right tool for that output. Is there a particular reason you want output as an image, rather than HTML/LaTeX? If there is a likelihood of formatting changes, tabyl may not be the right choice, compared to pander or kabelExtra.

Given a data frame with the subject as row, questions as factor encoded by answer columns, will you be stratifying by gender always or by multiple other characteristics?

For an rmarkdown document, am I right that the flow will be

text
table
...
text
table

If that's the case the chunk function should have as its argument the index of the next table generating function argument.

It's all doable and a good goal if you will be doing this often.

sgenter · August 12, 2020, 7:44pm

Right, that's the idea for the markdown document.

I just started using R, so my choice to use tabyl and flextable were mostly based on the fact that they were the first solutions I came across to produce a table that looked how I wanted it to look. For example, it was important for me to be able to show the total n and percent column, and tabyl was able to do that. I also generally liked the flextable aesthetic, since it replicates what a simple table would look like in a journal, which is why I used that. However, if I could arrive at the same outcome using different packages/functions, I would have no problem taking that approach.

I didn't use HTML or LaTeX in part because I have no experience with them and wouldn't know how to use them. Also, my customer wants a Word document. And, I'm not sure if this is accurate, but I'm wary of producing the markdown in HTML format since my (admittedly uninformed) understanding is it would then have a web address, which they definitely would want to avoid. I'm not sure if producing the image in HTML as you mentioned is the same as producing the markdown as HTML, so if it seems like I'm conflating them, that's why. In short, I don't know what the difference would be in having the output as an image vs. HTML vs. LaTex. I just want the tables to look basically as they do in my original example. I'm open to other ways of getting there.

My dataframe is basically as you described. I do have gender in the dataframe, but so far I'm not cross-tabulating anything, just giving one-way frequencies. For example, the first question essentially asks, "what is your job title?" The customer wants to know how many people reported being in x, y, or z position. To report that, I would paste the survey question in the markdown document with a frequency distribution below it. The next question asks what region they work in. They want a frequency table of that, and so on. At some point we may want to report cross-tabulations to them as well, but first things first.

Last, you said, "the chunk function should have as its argument the index of the next table generating function argument." That sounds right to me, but being new to R, I'm not exactly sure what or where the chunk function is. Is that just the space where code is inserted in a code chunk, or the space between the "r" and the "}" ?

system · September 2, 2020, 7:44pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.