Hi,
I'm having trouble understanding how the map
function works. I often would like to apply a function to multiple columns in a dataframe, but sometimes run into problems with some of the dplyr
verbs. Below is an example of a simple frequency tabulation by some variable(s). I have had good luck with janitor::tabyl
, but not so much with group_by
and summarize
(or filter
). Often what I want to do would take greater advantage of the summarize
function than the examples below, so knowing how map
might work with it would be useful.
Regards
suppressPackageStartupMessages({
library(tidyverse)
library(janitor)
})
# function for tabulating n by group using dplyr verbs
sum_n <- function(df, x) {
df %>%
group_by({{x}}) %>%
summarise(n = n())
}
# function for tabulating n by group using tabyl
tabyl_n <- function(df, x) {
df %>%
tabyl({{x}}) %>%
select(-percent) %>%
as_tibble()
}
# both produce the same output
# using summarize
mtcars %>%
sum_n(cyl)
#> # A tibble: 3 × 2
#> cyl n
#> <dbl> <int>
#> 1 4 11
#> 2 6 7
#> 3 8 14
#using tabyl
mtcars %>%
tabyl_n(cyl)
#> # A tibble: 3 × 2
#> cyl n
#> <dbl> <dbl>
#> 1 4 11
#> 2 6 7
#> 3 8 14
# but both do not work with map
# vector for map
sum_vars <- c("cyl", "am", "gear")
# tabyl works with map
map(sum_vars, ~tabyl_n(mtcars, .x))
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(.x)` instead of `.x` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> [[1]]
#> # A tibble: 3 × 2
#> cyl n
#> <dbl> <dbl>
#> 1 4 11
#> 2 6 7
#> 3 8 14
#>
#> [[2]]
#> # A tibble: 2 × 2
#> am n
#> <dbl> <dbl>
#> 1 0 19
#> 2 1 13
#>
#> [[3]]
#> # A tibble: 3 × 2
#> gear n
#> <dbl> <dbl>
#> 1 3 15
#> 2 4 12
#> 3 5 5
# summarize produces this error
map(sum_vars, ~sum_n(mtcars, .x))
#> Error: Must group by variables found in `.data`.
#> * Column `.x` is not found.
Created on 2021-11-30 by the reprex package (v2.0.1)