Investigating the behaviour of group_by
and grouped data along with splitting, nesting, etc. I was interested to see that output is sorted alphanumerically for splitting, but not for purrr::nest
. In the example dataset I've arranged by descending homeworld to put Coruscant before Alderaan. From group_by
, the group_keys
are sorted alphabetically, and group_split
works the same way as base R split
- both sorting by the grouping variable first. However purrr:nest
on grouped data does not sort. Just wanting to check whether this is expected behaviour, whether it's reasonable etc. It just caught me by surprise first time I saw it... but having seen that base R split
basically does the same, I don't think it's a particular issue. Maybe just something to be aware of.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(purrr)
by_homeworld <- starwars %>%
filter(homeworld %in% c("Coruscant", "Alderaan")) %>%
select(homeworld, name) %>%
arrange(desc(homeworld)) %>%
group_by(homeworld)
by_homeworld %>%
group_keys()
#> # A tibble: 2 x 1
#> homeworld
#> <chr>
#> 1 Alderaan
#> 2 Coruscant
by_homeworld %>%
ungroup()
#> # A tibble: 6 x 2
#> homeworld name
#> <chr> <chr>
#> 1 Coruscant Finis Valorum
#> 2 Coruscant Adi Gallia
#> 3 Coruscant Jocasta Nu
#> 4 Alderaan Leia Organa
#> 5 Alderaan Bail Prestor Organa
#> 6 Alderaan Raymus Antilles
by_homeworld %>%
tidyr::nest()
#> # A tibble: 2 x 2
#> # Groups: homeworld [2]
#> homeworld data
#> <chr> <list>
#> 1 Coruscant <tibble [3 x 1]>
#> 2 Alderaan <tibble [3 x 1]>
starwars %>%
filter(homeworld %in% c("Coruscant", "Alderaan")) %>%
select(homeworld, name) %>%
split(.$homeworld)
#> $Alderaan
#> # A tibble: 3 x 2
#> homeworld name
#> <chr> <chr>
#> 1 Alderaan Leia Organa
#> 2 Alderaan Bail Prestor Organa
#> 3 Alderaan Raymus Antilles
#>
#> $Coruscant
#> # A tibble: 3 x 2
#> homeworld name
#> <chr> <chr>
#> 1 Coruscant Finis Valorum
#> 2 Coruscant Adi Gallia
#> 3 Coruscant Jocasta Nu
by_homeworld %>%
group_split()
#> [[1]]
#> # A tibble: 3 x 2
#> homeworld name
#> <chr> <chr>
#> 1 Alderaan Leia Organa
#> 2 Alderaan Bail Prestor Organa
#> 3 Alderaan Raymus Antilles
#>
#> [[2]]
#> # A tibble: 3 x 2
#> homeworld name
#> <chr> <chr>
#> 1 Coruscant Finis Valorum
#> 2 Coruscant Adi Gallia
#> 3 Coruscant Jocasta Nu
#>
#> attr(,"ptype")
#> # A tibble: 0 x 2
#> # ... with 2 variables: homeworld <chr>, name <chr>
by_homeworld %>%
group_split() %>%
map_df(I)
#> # A tibble: 6 x 2
#> homeworld name
#> <chr> <chr>
#> 1 Alderaan Leia Organa
#> 2 Alderaan Bail Prestor Organa
#> 3 Alderaan Raymus Antilles
#> 4 Coruscant Finis Valorum
#> 5 Coruscant Adi Gallia
#> 6 Coruscant Jocasta Nu
Created on 2020-03-03 by the reprex package (v0.3.0)
Session info
sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 16299)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252
#> [2] LC_CTYPE=English_United Kingdom.1252
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_0.3.3 tidyr_1.0.2 dplyr_0.8.4
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.3 knitr_1.28 magrittr_1.5 tidyselect_1.0.0
#> [5] R6_2.4.1 rlang_0.4.4 fansi_0.4.1 stringr_1.4.0
#> [9] highr_0.8 tools_3.6.2 xfun_0.12 utf8_1.1.4
#> [13] cli_2.0.1 htmltools_0.4.0 yaml_2.2.1 assertthat_0.2.1
#> [17] digest_0.6.25 tibble_2.1.3 lifecycle_0.1.0 crayon_1.3.4
#> [21] vctrs_0.2.3 glue_1.3.1 evaluate_0.14 rmarkdown_2.1
#> [25] stringi_1.4.6 compiler_3.6.2 pillar_1.4.3 pkgconfig_2.0.3