I am trying to understand how to use NSE, but I have gotten lost.
I also struggle with thinking of the search terms to use in order to find
tutorials or code snippets.
I might be tempted to use functions like select_at()
or group_by_at()
,
but these say βlifecycle:supersededβ.
What is the best path forward? Should we continue using superseded functions?
GOAL
I want this code to run as expected:
get_composition(d, c("size", "owner", "item"))
Failed attempts
Below, I tried a whole bunch of different things, and none of them work.
library(rlang)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
get_composition <- function(d, c1, c2, c3) {
d %>%
select({{ c1 }}, {{ c2 }}, {{ c3 }}) %>%
group_by({{ c1 }}, {{ c2 }}, {{ c3 }}) %>%
summarize(n = n()) %>%
mutate(percent = 100 * n / sum(n))
}
d <- data.frame(
size = sample(c("large", "small"), size = 1e3, replace = TRUE),
color = sample(c("red", "blue"), size = 1e3, replace = TRUE),
owner = sample(letters[11:15], size = 1e3, replace = TRUE),
item = sample(letters[1:10], size = 1e3, replace = TRUE)
)
This works.
get_composition(d, size, owner, item)
#> `summarise()` regrouping output by 'size', 'owner' (override with `.groups` argument)
#> # A tibble: 100 x 5
#> # Groups: size, owner [10]
#> size owner item n percent
#> <chr> <chr> <chr> <int> <dbl>
#> 1 large k a 7 7.14
#> 2 large k b 15 15.3
#> 3 large k c 10 10.2
#> 4 large k d 15 15.3
#> 5 large k e 8 8.16
#> 6 large k f 6 6.12
#> 7 large k g 11 11.2
#> 8 large k h 8 8.16
#> 9 large k i 11 11.2
#> 10 large k j 7 7.14
#> # β¦ with 90 more rows
Yep, this also works.
get_composition(d, color, owner, item)
#> `summarise()` regrouping output by 'color', 'owner' (override with `.groups` argument)
#> # A tibble: 100 x 5
#> # Groups: color, owner [10]
#> color owner item n percent
#> <chr> <chr> <chr> <int> <dbl>
#> 1 blue k a 13 14.1
#> 2 blue k b 9 9.78
#> 3 blue k c 9 9.78
#> 4 blue k d 14 15.2
#> 5 blue k e 5 5.43
#> 6 blue k f 6 6.52
#> 7 blue k g 9 9.78
#> 8 blue k h 9 9.78
#> 9 blue k i 11 12.0
#> 10 blue k j 7 7.61
#> # β¦ with 90 more rows
Can we call get_composition() in a loop like this? Nope.
for (mycol in c("size", "color")) {
get_composition(d, mycol, owner, item) # Error: Column `mycol` is not found.
}
#> Note: Using an external vector in selections is ambiguous.
#> βΉ Use `all_of(mycol)` instead of `mycol` to silence this message.
#> βΉ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> Error: Must group by variables found in `.data`.
#> * Column `mycol` is not found.
OK, letβs make a character variable and try to use it β¦
mycol <- "size"
These attempts make a column with values equal to βsizeβ. Thatβs not what we want.
get_composition(d, !!mycol, owner, item)
#> `summarise()` regrouping output by '"size"', 'owner' (override with `.groups` argument)
#> # A tibble: 50 x 5
#> # Groups: "size", owner [5]
#> `"size"` owner item n percent
#> <chr> <chr> <chr> <int> <dbl>
#> 1 size k a 24 12.1
#> 2 size k b 24 12.1
#> 3 size k c 17 8.59
#> 4 size k d 25 12.6
#> 5 size k e 15 7.58
#> 6 size k f 14 7.07
#> 7 size k g 23 11.6
#> 8 size k h 18 9.09
#> 9 size k i 20 10.1
#> 10 size k j 18 9.09
#> # β¦ with 40 more rows
get_composition(d, rlang::as_name(mycol), owner, item)
#> `summarise()` regrouping output by 'rlang::as_name(mycol)', 'owner' (override with `.groups` argument)
#> # A tibble: 50 x 5
#> # Groups: rlang::as_name(mycol), owner [5]
#> `rlang::as_name(mycol)` owner item n percent
#> <chr> <chr> <chr> <int> <dbl>
#> 1 size k a 24 12.1
#> 2 size k b 24 12.1
#> 3 size k c 17 8.59
#> 4 size k d 25 12.6
#> 5 size k e 15 7.58
#> 6 size k f 14 7.07
#> 7 size k g 23 11.6
#> 8 size k h 18 9.09
#> 9 size k i 20 10.1
#> 10 size k j 18 9.09
#> # β¦ with 40 more rows
get_composition(d, rlang::enexpr(mycol), owner, item)
#> `summarise()` regrouping output by 'rlang::enexpr(mycol)', 'owner' (override with `.groups` argument)
#> # A tibble: 50 x 5
#> # Groups: rlang::enexpr(mycol), owner [5]
#> `rlang::enexpr(mycol)` owner item n percent
#> <chr> <chr> <chr> <int> <dbl>
#> 1 size k a 24 12.1
#> 2 size k b 24 12.1
#> 3 size k c 17 8.59
#> 4 size k d 25 12.6
#> 5 size k e 15 7.58
#> 6 size k f 14 7.07
#> 7 size k g 23 11.6
#> 8 size k h 18 9.09
#> 9 size k i 20 10.1
#> 10 size k j 18 9.09
#> # β¦ with 40 more rows
get_composition(d, rlang::string(mycol), owner, item)
#> `summarise()` regrouping output by 'rlang::string(mycol)', 'owner' (override with `.groups` argument)
#> # A tibble: 50 x 5
#> # Groups: rlang::string(mycol), owner [5]
#> `rlang::string(mycol)` owner item n percent
#> <chr> <chr> <chr> <int> <dbl>
#> 1 size k a 24 12.1
#> 2 size k b 24 12.1
#> 3 size k c 17 8.59
#> 4 size k d 25 12.6
#> 5 size k e 15 7.58
#> 6 size k f 14 7.07
#> 7 size k g 23 11.6
#> 8 size k h 18 9.09
#> 9 size k i 20 10.1
#> 10 size k j 18 9.09
#> # β¦ with 40 more rows
These attempts throw errors
get_composition(d, rlang::quo(mycol), owner, item)
#> Error: Must subset columns with a valid subscript vector.
#> x Subscript has the wrong type `quosure/formula`.
#> βΉ It must be numeric or character.
get_composition(d, rlang::parse_expr(mycol), owner, item)
#> Error: Problem with `mutate()` input `..1`.
#> x Input `..1` must be a vector, not a symbol.
#> βΉ Input `..1` is `rlang::parse_expr(mycol)`.
get_composition(d, rlang::expr(mycol), owner, item)
#> Error: Can't subset columns that don't exist.
#> x Column `mycol` doesn't exist.
get_composition(d, rlang::ensym(mycol), owner, item)
#> Error: Problem with `mutate()` input `..1`.
#> x Input `..1` must be a vector, not a symbol.
#> βΉ Input `..1` is `rlang::ensym(mycol)`.
This seems to work. It was inspired by this article.
get_composition2 <- function(d, c1, c2, c3) {
d %>%
select(.data[[c1]], .data[[c2]], .data[[c3]]) %>%
group_by(.data[[c1]], .data[[c2]], .data[[c3]]) %>%
summarize(n = n()) %>%
mutate(percent = 100 * n / sum(n))
}
get_composition2(d, "size", "owner", "item")
#> `summarise()` regrouping output by 'size', 'owner' (override with `.groups` argument)
#> # A tibble: 100 x 5
#> # Groups: size, owner [10]
#> size owner item n percent
#> <chr> <chr> <chr> <int> <dbl>
#> 1 large k a 7 7.14
#> 2 large k b 15 15.3
#> 3 large k c 10 10.2
#> 4 large k d 15 15.3
#> 5 large k e 8 8.16
#> 6 large k f 6 6.12
#> 7 large k g 11 11.2
#> 8 large k h 8 8.16
#> 9 large k i 11 11.2
#> 10 large k j 7 7.14
#> # β¦ with 90 more rows
This does not work as expected⦠how do we fix it?
get_composition3 <- function(d, ...) {
d %>%
select({{...}}) %>%
group_by({{...}}) %>%
summarize(n = n()) %>%
mutate(percent = 100 * n / sum(n))
}
get_composition3(d, size, owner, item)
#> Error: object 'owner' not found
get_composition3(d, c("size", "owner", "item"))
#> Error: "x" must be an argument name
get_composition3(d, "size", "owner", "item")
#> Error: unused arguments ("owner", "item")
This seems to work. So, whatβs the βstringyβ way to do this?
get_composition4 <- function(d, ...) {
d %>%
select(...) %>%
group_by(...) %>%
summarize(n = n()) %>%
mutate(percent = 100 * n / sum(n))
}
get_composition4(d, size, owner, item)
#> `summarise()` regrouping output by 'size', 'owner' (override with `.groups` argument)
#> # A tibble: 100 x 5
#> # Groups: size, owner [10]
#> size owner item n percent
#> <chr> <chr> <chr> <int> <dbl>
#> 1 large k a 7 7.14
#> 2 large k b 15 15.3
#> 3 large k c 10 10.2
#> 4 large k d 15 15.3
#> 5 large k e 8 8.16
#> 6 large k f 6 6.12
#> 7 large k g 11 11.2
#> 8 large k h 8 8.16
#> 9 large k i 11 11.2
#> 10 large k j 7 7.14
#> # β¦ with 90 more rows
Itβd be nice to have one of these expressions work as expected:
get_composition3(d, c("size", "owner", "item"))
get_composition3(d, "size", "owner", "item")
I followed the link in one of the error messages:
https://tidyselect.r-lib.org/reference/faq-external-vector.html
It let me to write this code, which also doesnβt work.
get_composition5 <- function(d, xs) {
d %>%
select(all_of(xs)) %>%
group_by(all_of(xs)) %>%
summarize(n = n()) %>%
mutate(percent = 100 * n / sum(n))
}
xs <- c("size", "owner", "item")
get_composition5(d, xs)
#> Error: Problem with `mutate()` input `..1`.
#> x Input `..1` can't be recycled to size 1000.
#> βΉ Input `..1` is `all_of(xs)`.
#> βΉ Input `..1` must be size 1000 or 1, not 3.
Created on 2021-02-11 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 4.0.3 (2020-10-10)
#> os macOS Catalina 10.15.7
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2021-02-11
#>
#> β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.2)
#> callr 3.5.1 2020-10-13 [2] CRAN (R 4.0.2)
#> cli 2.2.0 2020-11-20 [2] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [2] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [2] CRAN (R 4.0.2)
#> devtools 2.3.0 2020-04-10 [2] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [2] CRAN (R 4.0.2)
#> dplyr * 1.0.2 2020-08-18 [2] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [2] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.1)
#> fansi 0.4.1 2020-01-08 [2] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.1.0 2020-10-31 [2] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [2] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [2] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [2] CRAN (R 4.0.2)
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2)
#> lifecycle 0.2.0 2020-03-06 [2] CRAN (R 4.0.2)
#> magrittr 2.0.1.9000 2020-12-15 [1] Github (tidyverse/magrittr@bb1c86a)
#> memoise 1.1.0.9000 2020-12-15 [1] Github (r-lib/memoise@0901e3f)
#> pillar 1.4.7 2020-11-20 [2] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [2] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [2] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.0.2)
#> processx 3.4.5 2020-11-30 [2] CRAN (R 4.0.2)
#> ps 1.5.0 2020-12-05 [2] CRAN (R 4.0.2)
#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [2] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang * 0.4.9 2020-11-26 [2] CRAN (R 4.0.2)
#> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.2)
#> rprojroot 2.0.2 2020-11-15 [2] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [2] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.2)
#> testthat 3.0.0 2020-10-31 [2] CRAN (R 4.0.2)
#> tibble 3.0.4 2020-10-12 [2] CRAN (R 4.0.2)
#> tidyselect 1.1.0 2020-05-11 [2] CRAN (R 4.0.2)
#> usethis 1.6.1 2020-04-29 [2] CRAN (R 4.0.2)
#> utf8 1.1.4 2018-05-24 [2] CRAN (R 4.0.2)
#> vctrs 0.3.5 2020-11-17 [2] CRAN (R 4.0.2)
#> withr 2.3.0 2020-09-22 [2] CRAN (R 4.0.2)
#> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.2)
#>
#> [1] /Users/kamil/Library/R/4.0/library
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library