Stuck on some functional programming points

danriggins · October 21, 2021, 11:08pm

Say I have the following tibble:

people <- tibble::tribble(
          ~name,  ~gender, ~state_residence, ~city_residence,
          "Dan",   "Male",        "Indiana",  "Indianapolis",
        "Jerry",   "Male",       "New York", "New York City",
       "Bojack",   "Male",     "California",   "Los Angeles",
       "Leslie", "Female",        "Indiana",        "Pawnee",
          "Liz", "Female",       "New York", "New York City",
        "Jesse",   "Male",     "California", "San Francisco",
       "Daphne", "Female",     "Washington",       "Seattle"
)

And I've defined the following function:

count_and_percent <- function(tbl, var_obj, var_name) {
    tbl %>%
    group_by({{var_obj}}) %>%
    summarize(count = n()) %>%
    mutate(
        variable = var_name,
        percent = round(
            (count/nrow(tbl)*100),
            digits = 2
        )
    ) %>%
    rename(
        category = {{var_obj}}
    )
}

So for example I get:

> count_and_percent(people, state_residence, "state_residence")
# A tibble: 4 × 4
  category   count variable        percent
  <chr>      <int> <chr>             <dbl>
1 California     2 state_residence    28.6
2 Indiana        2 state_residence    28.6
3 New York       2 state_residence    28.6
4 Washington     1 state_residence    14.3

Question 1: Is there a way to modify the function so that the var_obj and var_name arguments can be combined into one argument?

Questions 2: If I want to apply the count_and_percent function across multiple columns in the same dataframe, could you illustrate how to do that with one of the purrr functions? Still trying to wrap my head around this stuff.

williaml · October 21, 2021, 11:12pm

You could do this for part 1:

count_and_percent <- function(tbl, var_name) {
  tbl %>%
    group_by(.data[[var_name]]) %>%
    summarize(count = n()) %>%
    mutate(
      variable = var_name,
      percent = round(
        (count/nrow(tbl)*100),
        digits = 2
      )
    ) %>%
    rename(
      category = .data[[var_name]]
    )
}


> count_and_percent(people, "state_residence")
# A tibble: 4 × 4
  category   count variable        percent
  <chr>      <int> <chr>             <dbl>
1 California     2 state_residence    28.6
2 Indiana        2 state_residence    28.6
3 New York       2 state_residence    28.6
4 Washington     1 state_residence    14.3

williaml · October 21, 2021, 11:16pm

Are you after something like this for part 2?

columns <- names(people[2:4])
map_df(columns, ~count_and_percent(people, .x))


# A tibble: 12 × 4
   category      count variable        percent
   <chr>         <int> <chr>             <dbl>
 1 California        2 state_residence    28.6
 2 Indiana           2 state_residence    28.6
 3 New York          2 state_residence    28.6
 4 Washington        1 state_residence    14.3
 5 Indianapolis      1 city_residence     14.3
 6 Los Angeles       1 city_residence     14.3
 7 New York City     2 city_residence     28.6
 8 Pawnee            1 city_residence     14.3
 9 San Francisco     1 city_residence     14.3
10 Seattle           1 city_residence     14.3
11 Female            3 gender             42.9
12 Male              4 gender             57.1

danriggins · October 21, 2021, 11:32pm

Yes exactly, thank you!

system · October 28, 2021, 11:32pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.