Neatest way of using dplyr::summarize?

rgm · November 8, 2021, 7:39pm

I wanted to make a tibble containing the values of the range, mean and standard deviations for three variables from a data set (child_cost, adult_cost, con_cost). I have included my code below. Is there a tidier / quicker way of achieving this please?

cost_tib <- data %>%
dplyr::summarise(
min_child_cost= min(child_cost),
max_child_cost = max(child_cost),
mean_child_cost = mean(child_cost),
sd_child_cost = sd(child_cost),
min_adult_cost = min(adult_cost),
max_adult_cost = max(adult_cost),
mean_adult_cost = mean(adult_cost),
sd_adult_cost = sd(adult_cost),
min_con_cost = min(con_cost),
max_con_cost = max(con_cost),
mean_con_cost = mean(con_cost),
sd_con_cost = sd(con_cost),
)

Thanks very much!

JackDavison · November 8, 2021, 7:49pm

Yes there is!

library(tidyverse)

df = tibble(child_cost = rnorm(n = 10),
            adult_cost = rnorm(n = 10),
            con_cost   = rnorm(n = 10))

Using tidyr

Use pivot_longer to stack all your columns, then pivot_wider back again later.

df %>%
  pivot_longer(child_cost:con_cost) %>%
  group_by(name) %>%
  summarise(mean = mean(value),
            min = min(value),
            max = max(value)) %>%
  pivot_wider(names_from = name, values_from = c(mean, min, max))

Using dplyr

In dplyr 1.0.0 the function across was implemented, which allows for this sort of multi-column manipulation. See here: dplyr 1.0.0: working across columns

df %>%
  summarise(across(c(child_cost, adult_cost, con_cost), 
                   list(min = ~min(.x),
                        mean = ~mean(.x), 
                        max = ~max(.x))))

rgm · November 8, 2021, 8:35pm

Thank you very much Jack!

system · November 15, 2021, 8:35pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.