I was looking for a way to select multiple variables easily while using complete(nesting(...))
and stumbled on this post: Quick selection of many variables in tidyr::nesting()
The suggestion of using nesting(!!! select(...))
seemed ideal to me, but unlike in the example I'm using a grouped tibble and getting strange behaviour.
Using the same sample data as in the linked post:
library(tidyverse)
df <- tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
something = c("c", "d", "d"),
another = c("z", "y", "X"),
value1 = 1:3,
value2 = 4:6
)
> df
# A tibble: 3 x 7
group item_id item_name something another value1 value2
<dbl> <dbl> <chr> <chr> <chr> <int> <int>
1 1 1 a c z 1 4
2 2 2 b d y 2 5
3 1 2 b d X 3 6
On an ungrouped tibble, complete()
gives the same output whether the columns are stated explicitly or using select()
.
df %>%
complete(group, nesting(item_id, item_name, something, another))
#> # A tibble: 6 x 7
#> group item_id item_name something another value1 value2
#> <dbl> <dbl> <chr> <chr> <chr> <int> <int>
#> 1 1 1 a c z 1 4
#> 2 1 2 b d X 3 6
#> 3 1 2 b d y NA NA
#> 4 2 1 a c z NA NA
#> 5 2 2 b d X NA NA
#> 6 2 2 b d y 2 5
df %>%
complete(group, nesting(!!!select(., item_id:another)))
#> # A tibble: 6 x 7
#> group item_id item_name something another value1 value2
#> <dbl> <dbl> <chr> <chr> <chr> <int> <int>
#> 1 1 1 a c z 1 4
#> 2 1 2 b d X 3 6
#> 3 1 2 b d y NA NA
#> 4 2 1 a c z NA NA
#> 5 2 2 b d X NA NA
#> 6 2 2 b d y 2 5
In a grouped tibble, stating the variables explicitly works as expected, with the missing combinations in each group added separately:
df %>%
group_by(item_name) %>%
complete(group, nesting(item_id, item_name, something, another))
#> # A tibble: 5 x 7
#> # Groups: item_name [2]
#> group item_id item_name something another value1 value2
#> <dbl> <dbl> <chr> <chr> <chr> <int> <int>
#> 1 1 1 a c z 1 4
#> 2 1 2 b d X 3 6
#> 3 1 2 b d y NA NA
#> 4 2 2 b d X NA NA
#> 5 2 2 b d y 2 5
However, when using select()
as before, the output is different. Using distinct()
to remove the duplicate rows generated here gives the same output as in the ungrouped examples.
df %>%
group_by(item_name) %>%
complete(group, nesting(!!!select(., item_id:another)))
#> # A tibble: 9 x 7
#> # Groups: item_name [2]
#> group item_id item_name something another value1 value2
#> <dbl> <dbl> <chr> <chr> <chr> <int> <int>
#> 1 1 1 a c z 1 4
#> 2 1 2 b d X 3 6
#> 3 1 2 b d y NA NA
#> 4 1 1 a c z 1 4
#> 5 1 2 b d X 3 6
#> 6 1 2 b d y NA NA
#> 7 2 1 a c z NA NA
#> 8 2 2 b d X NA NA
#> 9 2 2 b d y 2 5
Could anyone explain why this is happening to me, and is it possible to use the select()
method in a grouped tibble?