Trying to diagnose why function works once or twice then errors out on same data, even though environment hasn't changed

joels · February 19, 2021, 12:33am

I'm having a bizarre problem in which a tidyeval function I wrote works fine the first time I run it with a particular data frame, but might or might not work on subsequent attempts. I've provided two reprexes below, just to show a couple of different failure modes. Does anyone know what could be causing this and how to fix it?

library(tidyverse)

fnc = function(data, value.vars, group.vars=NULL) {
  data %>% 
    group_by(across({{group.vars}})) %>% 
    summarise(n=n(), across({{value.vars}}, 
                            list(mean=~mean(.x, na.rm=TRUE),
                                 n.miss=~sum(is.na(.x))), 
                            .names="{.fn}_{.col}"))
}

mtcars %>% fnc(mpg)
#> # A tibble: 1 x 3
#>       n mean_mpg n.miss_mpg
#>   <int>    <dbl>      <int>
#> 1    32     20.1          0

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> # A tibble: 3 x 6
#>   Species     n mean_Petal.Width n.miss_Petal.Wi… mean_Sepal.Width
#> * <fct>   <int>            <dbl>            <int>            <dbl>
#> 1 setosa     50            0.246                0             3.43
#> 2 versic…    50            1.33                 0             2.77
#> 3 virgin…    50            2.03                 0             2.97
#> # … with 1 more variable: n.miss_Sepal.Width <int>

diamonds %>% fnc(c(x,y), c(cut, color))
#> `summarise()` has grouped output by 'cut'. You can override using the `.groups` argument.
#> # A tibble: 35 x 7
#> # Groups:   cut [5]
#>    cut   color     n mean_x n.miss_x mean_y n.miss_y
#>    <ord> <ord> <int>  <dbl>    <int>  <dbl>    <int>
#>  1 Fair  D       163   6.02        0   5.96        0
#>  2 Fair  E       224   5.91        0   5.86        0
#>  3 Fair  F       312   5.99        0   5.93        0
#>  4 Fair  G       314   6.17        0   6.11        0
#>  5 Fair  H       303   6.58        0   6.50        0
#>  6 Fair  I       175   6.56        0   6.49        0
#>  7 Fair  J       119   6.75        0   6.68        0
#>  8 Good  D       662   5.62        0   5.63        0
#>  9 Good  E       933   5.62        0   5.63        0
#> 10 Good  F       909   5.69        0   5.71        0
#> # … with 25 more rows

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 3 elements.

diamonds %>% fnc(c(x,y))
#> Error: Problem with `summarise()` input `..2`.
#> x subscript out of bounds
#> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

^{Created on 2021-02-18 by the reprex package (v1.0.0)}

library(tidyverse)

fnc = function(data, value.vars, group.vars=NULL) {
  data %>% 
    group_by(across({{group.vars}})) %>% 
    summarise(n=n(), across({{value.vars}}, 
                            list(mean=~mean(.x, na.rm=TRUE),
                                 n.miss=~sum(is.na(.x))), 
                            .names="{.fn}_{.col}"))
}

diamonds %>% fnc(c(x,y))
#> # A tibble: 1 x 5
#>       n mean_x n.miss_x mean_y n.miss_y
#>   <int>  <dbl>    <int>  <dbl>    <int>
#> 1 53940   5.73        0   5.73        0

mtcars %>% fnc(mpg)
#> # A tibble: 1 x 3
#>       n mean_mpg n.miss_mpg
#>   <int>    <dbl>      <int>
#> 1    32     20.1          0

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> # A tibble: 3 x 6
#>   Species     n mean_Petal.Width n.miss_Petal.Wi… mean_Sepal.Width
#> * <fct>   <int>            <dbl>            <int>            <dbl>
#> 1 setosa     50            0.246                0             3.43
#> 2 versic…    50            1.33                 0             2.77
#> 3 virgin…    50            2.03                 0             2.97
#> # … with 1 more variable: n.miss_Sepal.Width <int>

diamonds %>% fnc(c(x,y), c(cut, color))
#> `summarise()` has grouped output by 'cut'. You can override using the `.groups` argument.
#> # A tibble: 35 x 7
#> # Groups:   cut [5]
#>    cut   color     n mean_x n.miss_x mean_y n.miss_y
#>    <ord> <ord> <int>  <dbl>    <int>  <dbl>    <int>
#>  1 Fair  D       163   6.02        0   5.96        0
#>  2 Fair  E       224   5.91        0   5.86        0
#>  3 Fair  F       312   5.99        0   5.93        0
#>  4 Fair  G       314   6.17        0   6.11        0
#>  5 Fair  H       303   6.58        0   6.50        0
#>  6 Fair  I       175   6.56        0   6.49        0
#>  7 Fair  J       119   6.75        0   6.68        0
#>  8 Good  D       662   5.62        0   5.63        0
#>  9 Good  E       933   5.62        0   5.63        0
#> 10 Good  F       909   5.69        0   5.71        0
#> # … with 25 more rows

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 3 elements.

mtcars %>% fnc(mpg, cyl)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 3 elements.

diamonds %>% fnc(c(x,y), color)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 7 elements.

^{Created on 2021-02-18 by the reprex package (v1.0.0)}

martin.R · February 19, 2021, 10:56am

There was a very similar issue very recently on here where the order of calling some functions separately affected the output. Somebody logged it as an issue on github. Unfortunately I cannot find it either on here or on github, but it may be related.

Sorry not to be of more help, but this might help you or somebody else locate the other issue.

joels · February 19, 2021, 6:08pm

Thanks Martin. I haven't been able to find it either. I've posted this as an issue on dplyr github.

nirgrahamuk · February 19, 2021, 8:36pm

Hi Joel's, small suggestion to share a sessionInfo() because it may be version related.

joels · February 19, 2021, 8:54pm

Good suggestion Nir. I reran the first reprex, but with calls to sessionInfo before and after running the function (see below). It turns out that the namespace of two additional packages, fansi and utf8, are loaded after the function is run for the first time. In the second and third calls to sessionInfo(), you can seem them in the namespace package list at positions 35 and 47. Presumably, this is the source of the problem, but I'm not sure what's actually going wrong. I'll try uninstalling both packages (none of the packages in my R setup seem to depend on these two packages and I don't recall installing them explicitly) and see if that fixes the problem.

library(tidyverse)

fnc = function(data, value.vars, group.vars=NULL) {
  data %>% 
    group_by(across({{group.vars}})) %>% 
    summarise(n=n(), across({{value.vars}}, 
                            list(mean=~mean(.x, na.rm=TRUE),
                                 n.miss=~sum(is.na(.x))), 
                            .names="{.fn}_{.col}"))
}

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.4     purrr_0.3.4    
#> [5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.6    ggplot2_3.3.3  
#> [9] tidyverse_1.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7      compiler_4.0.3   
#>  [5] dbplyr_2.1.0      highr_0.8         tools_4.0.3       digest_0.6.27    
#>  [9] lubridate_1.7.9.2 jsonlite_1.7.2    evaluate_0.14     lifecycle_0.2.0  
#> [13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.10      reprex_1.0.0     
#> [17] cli_2.3.0         DBI_1.1.1         yaml_2.2.1        haven_2.3.1      
#> [21] xfun_0.20         withr_2.4.1       xml2_1.3.2        httr_1.4.2       
#> [25] styler_1.3.2      knitr_1.31        hms_1.0.0         generics_0.1.0   
#> [29] fs_1.5.0          vctrs_0.3.6       grid_4.0.3        tidyselect_1.1.0 
#> [33] glue_1.4.2        R6_2.5.0          readxl_1.3.1      rmarkdown_2.6    
#> [37] modelr_0.1.8      magrittr_2.0.1    backports_1.2.1   scales_1.1.1     
#> [41] ellipsis_0.3.1    htmltools_0.5.1.1 rvest_0.3.6       assertthat_0.2.1 
#> [45] colorspace_2.0-0  stringi_1.5.3     munsell_0.5.0     broom_0.7.4      
#> [49] crayon_1.4.0

mtcars %>% fnc(mpg)
#> # A tibble: 1 x 3
#>       n mean_mpg n.miss_mpg
#>   <int>    <dbl>      <int>
#> 1    32     20.1          0

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.4     purrr_0.3.4    
#> [5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.6    ggplot2_3.3.3  
#> [9] tidyverse_1.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7      compiler_4.0.3   
#>  [5] dbplyr_2.1.0      highr_0.8         tools_4.0.3       digest_0.6.27    
#>  [9] lubridate_1.7.9.2 jsonlite_1.7.2    evaluate_0.14     lifecycle_0.2.0  
#> [13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.10      reprex_1.0.0     
#> [17] cli_2.3.0         DBI_1.1.1         yaml_2.2.1        haven_2.3.1      
#> [21] xfun_0.20         withr_2.4.1       xml2_1.3.2        httr_1.4.2       
#> [25] styler_1.3.2      knitr_1.31        hms_1.0.0         generics_0.1.0   
#> [29] fs_1.5.0          vctrs_0.3.6       grid_4.0.3        tidyselect_1.1.0 
#> [33] glue_1.4.2        R6_2.5.0          fansi_0.4.2       readxl_1.3.1     
#> [37] rmarkdown_2.6     modelr_0.1.8      magrittr_2.0.1    backports_1.2.1  
#> [41] scales_1.1.1      ellipsis_0.3.1    htmltools_0.5.1.1 rvest_0.3.6      
#> [45] assertthat_0.2.1  colorspace_2.0-0  utf8_1.1.4        stringi_1.5.3    
#> [49] munsell_0.5.0     broom_0.7.4       crayon_1.4.0

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> # A tibble: 3 x 6
#>   Species     n mean_Petal.Width n.miss_Petal.Wi… mean_Sepal.Width
#> * <fct>   <int>            <dbl>            <int>            <dbl>
#> 1 setosa     50            0.246                0             3.43
#> 2 versic…    50            1.33                 0             2.77
#> 3 virgin…    50            2.03                 0             2.97
#> # … with 1 more variable: n.miss_Sepal.Width <int>

diamonds %>% fnc(c(x,y), c(cut, color))
#> `summarise()` has grouped output by 'cut'. You can override using the `.groups` argument.
#> # A tibble: 35 x 7
#> # Groups:   cut [5]
#>    cut   color     n mean_x n.miss_x mean_y n.miss_y
#>    <ord> <ord> <int>  <dbl>    <int>  <dbl>    <int>
#>  1 Fair  D       163   6.02        0   5.96        0
#>  2 Fair  E       224   5.91        0   5.86        0
#>  3 Fair  F       312   5.99        0   5.93        0
#>  4 Fair  G       314   6.17        0   6.11        0
#>  5 Fair  H       303   6.58        0   6.50        0
#>  6 Fair  I       175   6.56        0   6.49        0
#>  7 Fair  J       119   6.75        0   6.68        0
#>  8 Good  D       662   5.62        0   5.63        0
#>  9 Good  E       933   5.62        0   5.63        0
#> 10 Good  F       909   5.69        0   5.71        0
#> # … with 25 more rows

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 3 elements.

diamonds %>% fnc(c(x,y))
#> Error: Problem with `summarise()` input `..2`.
#> x subscript out of bounds
#> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

mtcars %>% fnc(mpg)
#> Error: Problem with `summarise()` input `..2`.
#> x subscript out of bounds
#> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.4     purrr_0.3.4    
#> [5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.6    ggplot2_3.3.3  
#> [9] tidyverse_1.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7      compiler_4.0.3   
#>  [5] dbplyr_2.1.0      highr_0.8         tools_4.0.3       digest_0.6.27    
#>  [9] lubridate_1.7.9.2 jsonlite_1.7.2    evaluate_0.14     lifecycle_0.2.0  
#> [13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.10      reprex_1.0.0     
#> [17] cli_2.3.0         DBI_1.1.1         yaml_2.2.1        haven_2.3.1      
#> [21] xfun_0.20         withr_2.4.1       xml2_1.3.2        httr_1.4.2       
#> [25] styler_1.3.2      knitr_1.31        hms_1.0.0         generics_0.1.0   
#> [29] fs_1.5.0          vctrs_0.3.6       grid_4.0.3        tidyselect_1.1.0 
#> [33] glue_1.4.2        R6_2.5.0          fansi_0.4.2       readxl_1.3.1     
#> [37] rmarkdown_2.6     modelr_0.1.8      magrittr_2.0.1    backports_1.2.1  
#> [41] scales_1.1.1      ellipsis_0.3.1    htmltools_0.5.1.1 rvest_0.3.6      
#> [45] assertthat_0.2.1  colorspace_2.0-0  utf8_1.1.4        stringi_1.5.3    
#> [49] munsell_0.5.0     broom_0.7.4       crayon_1.4.0

^{Created on 2021-02-19 by the reprex package (v1.0.0)}

StatSteph · February 19, 2021, 8:56pm

Yes, here's the issues that might be related:

github.com/tidyverse/dplyr

group_by and summarise don't work after another analysis. "Can't subset elements that don't exist."

opened 04:47PM - 03 Feb 21 UTC

closed 02:59PM - 15 Feb 21 UTC

szimmer

bug

When doing analysis using group_by and summarise, an isolated analysis works but… when an unrelated analysis proceeds it, there is an error. This does not occur with dplyr 1.0.2. Issue originally posted on community.rstudio.com by someone else: https://community.rstudio.com/t/it-works-alone-but-fails-successively/95017 Working example: ``` r library(tidyverse) library(palmerpenguins) # code B penguins %>% group_by(species) %>% summarise( n = n(), across(starts_with("bill_"), mean, na.rm = TRUE), Area = mean(bill_length_mm * bill_depth_mm, na.rm = TRUE), across(ends_with("_g"), mean, na.rm = TRUE), ) #> # A tibble: 3 x 6 #> species n bill_length_mm bill_depth_mm Area body_mass_g #> * <fct> <int> <dbl> <dbl> <dbl> <dbl> #> 1 Adelie 152 38.8 18.3 712. 3701. #> 2 Chinstrap 68 48.8 18.4 900. 3733. #> 3 Gentoo 124 47.5 15.0 712. 5076. ``` Created on 2021-02-03 by the [reprex package](https://reprex.tidyverse.org) (v1.0.0) <details style="margin-bottom:10px;"> <summary>Session info</summary> ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.3 (2020-10-10) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/New_York #> date 2021-02-03 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) #> backports 1.2.0 2020-11-02 [1] CRAN (R 4.0.3) #> broom 0.7.4 2021-01-29 [1] CRAN (R 4.0.3) #> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.2) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2) #> colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2) #> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2) #> dbplyr 2.0.0 2020-11-03 [1] CRAN (R 4.0.3) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> dplyr * 1.0.4 2021-02-02 [1] CRAN (R 4.0.3) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2) #> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.0.3) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) #> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.3) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2) #> haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2) #> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.3) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2) #> lubridate 1.7.9.2 2020-11-13 [1] CRAN (R 4.0.3) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.2) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2) #> palmerpenguins * 0.1.0 2020-07-23 [1] CRAN (R 4.0.3) #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) #> R.cache 0.14.0 2019-12-06 [1] CRAN (R 4.0.3) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.3) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.3) #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.3) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) #> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2) #> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.3) #> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.2) #> rematch2 2.1.2 2020-05-01 [1] CRAN (R 4.0.2) #> reprex 1.0.0 2021-01-27 [1] CRAN (R 4.0.3) #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3) #> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3) #> rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.2) #> rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.2) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.3) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3) #> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> styler 1.3.2 2020-02-23 [1] CRAN (R 4.0.3) #> tibble * 3.0.6 2021-01-29 [1] CRAN (R 4.0.3) #> tidyr * 1.1.2 2020-08-27 [1] CRAN (R 4.0.2) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2) #> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.3) #> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.2) #> vctrs 0.3.5 2020-11-17 [1] CRAN (R 4.0.3) #> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3) #> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> #> [1] ../Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.3/library ``` </details> Broken example: ``` r library(tidyverse) library(palmerpenguins) # code A penguins %>% group_by(species, island) %>% summarise( prob = c(.25, .75), across( c(bill_length_mm, bill_depth_mm, flipper_length_mm), ~ quantile(., prob, na.rm = TRUE) ) ) #> `summarise()` has grouped output by 'species', 'island'. You can override using the `.groups` argument. #> # A tibble: 10 x 6 #> # Groups: species, island [5] #> species island prob bill_length_mm bill_depth_mm flipper_length_mm #> <fct> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 Adelie Biscoe 0.25 37.7 17.6 185. #> 2 Adelie Biscoe 0.75 40.7 19.0 193 #> 3 Adelie Dream 0.25 36.8 17.5 185 #> 4 Adelie Dream 0.75 40.4 18.8 193 #> 5 Adelie Torgersen 0.25 36.7 17.4 187 #> 6 Adelie Torgersen 0.75 41.1 19.2 195 #> 7 Chinstrap Dream 0.25 46.3 17.5 191 #> 8 Chinstrap Dream 0.75 51.1 19.4 201 #> 9 Gentoo Biscoe 0.25 45.3 14.2 212 #> 10 Gentoo Biscoe 0.75 49.6 15.7 221 # code B penguins %>% group_by(species) %>% summarise( n = n(), across(starts_with("bill_"), mean, na.rm = TRUE), Area = mean(bill_length_mm * bill_depth_mm, na.rm = TRUE), across(ends_with("_g"), mean, na.rm = TRUE), ) #> Error: Can't subset elements that don't exist. #> x Location 5 doesn't exist. #> i There are only 3 elements. rlang::last_error() #> <error/vctrs_error_subscript_oob> #> Can't subset elements that don't exist. #> x Location 5 doesn't exist. #> i There are only 3 elements. #> Backtrace: #> 1. `%>%`(...) #> 32. dplyr::cur_group() #> 33. peek_mask("cur_group()")$current_key() #> 34. vctrs::vec_slice(private$keys, self$get_current_group()) #> 36. vctrs:::stop_subscript_oob(...) #> 37. vctrs:::stop_subscript(...) #> Run `rlang::last_trace()` to see the full context. ``` Created on 2021-02-03 by the [reprex package](https://reprex.tidyverse.org) (v1.0.0) <details style="margin-bottom:10px;"> <summary>Session info</summary> ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.3 (2020-10-10) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/New_York #> date 2021-02-03 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) #> backports 1.2.0 2020-11-02 [1] CRAN (R 4.0.3) #> broom 0.7.4 2021-01-29 [1] CRAN (R 4.0.3) #> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.2) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2) #> colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2) #> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2) #> dbplyr 2.0.0 2020-11-03 [1] CRAN (R 4.0.3) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> dplyr * 1.0.4 2021-02-02 [1] CRAN (R 4.0.3) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2) #> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.0.3) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) #> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.3) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2) #> haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2) #> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.3) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2) #> lubridate 1.7.9.2 2020-11-13 [1] CRAN (R 4.0.3) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.2) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2) #> palmerpenguins * 0.1.0 2020-07-23 [1] CRAN (R 4.0.3) #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) #> R.cache 0.14.0 2019-12-06 [1] CRAN (R 4.0.3) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.3) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.3) #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.3) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) #> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2) #> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.3) #> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.2) #> rematch2 2.1.2 2020-05-01 [1] CRAN (R 4.0.2) #> reprex 1.0.0 2021-01-27 [1] CRAN (R 4.0.3) #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3) #> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3) #> rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.2) #> rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.2) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.3) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3) #> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> styler 1.3.2 2020-02-23 [1] CRAN (R 4.0.3) #> tibble * 3.0.6 2021-01-29 [1] CRAN (R 4.0.3) #> tidyr * 1.1.2 2020-08-27 [1] CRAN (R 4.0.2) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2) #> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.3) #> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.2) #> vctrs 0.3.5 2020-11-17 [1] CRAN (R 4.0.3) #> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3) #> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> #> [1] ../Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.3/library ``` </details>

github.com/tidyverse/dplyr

group_by() throws error if across() is third or higher argument, but only after unrelated analysis using across()

opened 02:38AM - 06 Feb 21 UTC

closed 03:49PM - 15 Feb 21 UTC

CoryMcCartan

bug

After upgrading to dplyr 1.0.4, noticing this strange behavior, which only surfa…ces after a first call to `summarize()` which uses `across()`. After this first call, `group_by()` will only work with `across()` if `across()` is the first argument after the data frame. If it is the second, an error is thrown: `Problem with mutate() input ..2`. Possibly related to #5733 ``` r library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union df = tibble(y = rep(1:2, each=6), z = rep(3:5, 4), x = rpois(12, 7)) mtcars %>% mutate(x = rpois(n(), 7)) %>% group_by(cyl) %>% summarize(across(where(is.numeric), sum)) #> # A tibble: 3 x 12 #> cyl mpg disp hp drat wt qsec vs am gear carb x #> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> #> 1 4 293. 1156. 909 44.8 25.1 211. 10 8 45 17 63 #> 2 6 138. 1283. 856 25.1 21.8 126. 4 3 27 24 41 #> 3 8 211. 4943. 2929 45.2 56.0 235. 0 2 46 49 103 group_by(df, across(all_of("z")), y) %>% summarize(x= mean(x)) #> `summarise()` has grouped output by 'z'. You can override using the `.groups` argument. #> # A tibble: 6 x 3 #> # Groups: z [3] #> z y x #> <int> <int> <dbl> #> 1 3 1 7.5 #> 2 3 2 3.5 #> 3 4 1 7.5 #> 4 4 2 2 #> 5 5 1 7 #> 6 5 2 5 group_by(df, y, across(all_of("z"))) %>% summarize(x= mean(x)) #> Error: Problem adding computed columns in `group_by()`. #> x Problem with `mutate()` input `..2`. #> x subscript out of bounds #> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`. group_by(df, across(all_of("z")), y) %>% summarize(x= mean(x)) #> `summarise()` has grouped output by 'z'. You can override using the `.groups` argument. #> # A tibble: 6 x 3 #> # Groups: z [3] #> z y x #> <int> <int> <dbl> #> 1 3 1 7.5 #> 2 3 2 3.5 #> 3 4 1 7.5 #> 4 4 2 2 #> 5 5 1 7 #> 6 5 2 5 ``` Created on 2021-02-05 by the [reprex package](https://reprex.tidyverse.org) (v1.0.0)

nirgrahamuk · February 19, 2021, 8:58pm

github.com/tidyverse/dplyr

internal error dealing with .current_group

tidyverse:master ← tidyverse:across_fix_5733

opened 02:29PM - 15 Feb 21 UTC

romainfrancois

+4 -6

closes #5733 It's the kind of problem that is not easily reproducible in a te…st, but the original code from #5733 now gives: ``` r library(tidyverse) library(palmerpenguins) # code A penguins %>% group_by(species, island) %>% summarise( prob = c(.25, .75), across( c(bill_length_mm, bill_depth_mm, flipper_length_mm), ~ quantile(., prob, na.rm = TRUE) ) ) #> `summarise()` has grouped output by 'species', 'island'. You can override using the `.groups` argument. #> # A tibble: 10 x 6 #> # Groups: species, island [5] #> species island prob bill_length_mm bill_depth_mm flipper_length_mm #> <fct> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 Adelie Biscoe 0.25 37.7 17.6 185. #> 2 Adelie Biscoe 0.75 40.7 19.0 193 #> 3 Adelie Dream 0.25 36.8 17.5 185 #> 4 Adelie Dream 0.75 40.4 18.8 193 #> 5 Adelie Torgersen 0.25 36.7 17.4 187 #> 6 Adelie Torgersen 0.75 41.1 19.2 195 #> 7 Chinstrap Dream 0.25 46.3 17.5 191 #> 8 Chinstrap Dream 0.75 51.1 19.4 201 #> 9 Gentoo Biscoe 0.25 45.3 14.2 212 #> 10 Gentoo Biscoe 0.75 49.6 15.7 221 # code B penguins %>% group_by(species) %>% summarise( n = n(), across(starts_with("bill_"), mean, na.rm = TRUE), Area = mean(bill_length_mm * bill_depth_mm, na.rm = TRUE), across(ends_with("_g"), mean, na.rm = TRUE), ) #> # A tibble: 3 x 6 #> species n bill_length_mm bill_depth_mm Area body_mass_g #> <fct> <int> <dbl> <dbl> <dbl> <dbl> #> 1 Adelie 152 38.8 18.3 712. 3701. #> 2 Chinstrap 68 48.8 18.4 900. 3733. #> 3 Gentoo 124 47.5 15.0 712. 5076. ``` Created on 2021-02-15 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)

joels · February 20, 2021, 12:10am

Thanks Stephanie! Those definitely look like the same underlying problem. After reading the second issue in your post, I installed the development version of dplyr from github and the issue went away. I will update my github issue to link to the issues you and Nir shared.

system · March 13, 2021, 12:10am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.