Behaviour of complete(nesting(!!! select(...))) changes in grouped tibble

I was looking for a way to select multiple variables easily while using complete(nesting(...)) and stumbled on this post: Quick selection of many variables in tidyr::nesting()

The suggestion of using nesting(!!! select(...)) seemed ideal to me, but unlike in the example I'm using a grouped tibble and getting strange behaviour.

Using the same sample data as in the linked post:

library(tidyverse)

df <- tibble(
  group = c(1:2, 1),
  item_id = c(1:2, 2),
  item_name = c("a", "b", "b"),
  something = c("c", "d", "d"),
  another = c("z", "y", "X"),
  value1 = 1:3,
  value2 = 4:6
)
> df
# A tibble: 3 x 7
  group item_id item_name something another value1 value2
  <dbl>   <dbl> <chr>     <chr>     <chr>    <int>  <int>
1     1       1 a         c         z            1      4
2     2       2 b         d         y            2      5
3     1       2 b         d         X            3      6

On an ungrouped tibble, complete() gives the same output whether the columns are stated explicitly or using select().

df %>% 
  complete(group, nesting(item_id, item_name, something, another))

#> # A tibble: 6 x 7
#>   group item_id item_name something another value1 value2
#>   <dbl>   <dbl> <chr>     <chr>     <chr>    <int>  <int>
#> 1     1       1 a         c         z            1      4
#> 2     1       2 b         d         X            3      6
#> 3     1       2 b         d         y           NA     NA
#> 4     2       1 a         c         z           NA     NA
#> 5     2       2 b         d         X           NA     NA
#> 6     2       2 b         d         y            2      5
df %>% 
  complete(group, nesting(!!!select(., item_id:another)))

#> # A tibble: 6 x 7
#>   group item_id item_name something another value1 value2
#>   <dbl>   <dbl> <chr>     <chr>     <chr>    <int>  <int>
#> 1     1       1 a         c         z            1      4
#> 2     1       2 b         d         X            3      6
#> 3     1       2 b         d         y           NA     NA
#> 4     2       1 a         c         z           NA     NA
#> 5     2       2 b         d         X           NA     NA
#> 6     2       2 b         d         y            2      5

In a grouped tibble, stating the variables explicitly works as expected, with the missing combinations in each group added separately:

df %>% 
  group_by(item_name) %>%
  complete(group, nesting(item_id, item_name, something, another))
#> # A tibble: 5 x 7
#> # Groups:   item_name [2]
#>   group item_id item_name something another value1 value2
#>   <dbl>   <dbl> <chr>     <chr>     <chr>    <int>  <int>
#> 1     1       1 a         c         z            1      4
#> 2     1       2 b         d         X            3      6
#> 3     1       2 b         d         y           NA     NA
#> 4     2       2 b         d         X           NA     NA
#> 5     2       2 b         d         y            2      5

However, when using select() as before, the output is different. Using distinct() to remove the duplicate rows generated here gives the same output as in the ungrouped examples.

df %>% 
  group_by(item_name) %>%
  complete(group, nesting(!!!select(., item_id:another)))
#> # A tibble: 9 x 7
#> # Groups:   item_name [2]
#>   group item_id item_name something another value1 value2
#>   <dbl>   <dbl> <chr>     <chr>     <chr>    <int>  <int>
#> 1     1       1 a         c         z            1      4
#> 2     1       2 b         d         X            3      6
#> 3     1       2 b         d         y           NA     NA
#> 4     1       1 a         c         z            1      4
#> 5     1       2 b         d         X            3      6
#> 6     1       2 b         d         y           NA     NA
#> 7     2       1 a         c         z           NA     NA
#> 8     2       2 b         d         X           NA     NA
#> 9     2       2 b         d         y            2      5

Could anyone explain why this is happening to me, and is it possible to use the select() method in a grouped tibble?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.