What is the "core" tidyverse packages?

In the README and the NEWS, some packages in tidyverse are described as "core", but I couldn't find the clear description about what "core" means.

In my understanding, there are three groups among the packages imported by tidyverse package:

  1. packages that are "core", which are loaded by library(tidyverse): ggplot2, dplyr, ...
  2. packages that aren't loaded by default but treated as the members of tidyverse: lubridate, httr, ...
  3. packages that are not the members of tidyverse, used merely for visual effects: cli, rstudioapi

What is the criteria between these?

Here is a reprex of running library(tidyverse) as well as the sessionInfo. As you can see, the core packages are attached.

library(tidyverse)
sessionInfo()
#> R version 3.4.2 (2017-09-28)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Sierra 10.12.6
#> 
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.2.0      stringr_1.2.0      dplyr_0.7.4.9000  
#> [4] purrr_0.2.4        readr_1.1.1.9000   tidyr_0.7.2       
#> [7] tibble_1.3.4       ggplot2_2.2.1.9000 tidyverse_1.2.1   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.14          cellranger_1.1.0      compiler_3.4.2       
#>  [4] plyr_1.8.4            bindr_0.1             tools_3.4.2          
#>  [7] digest_0.6.12         lubridate_1.7.1       jsonlite_1.5         
#> [10] evaluate_0.10.1       nlme_3.1-131          gtable_0.2.0         
#> [13] lattice_0.20-35       pkgconfig_2.0.1       rlang_0.1.4.9000     
#> [16] psych_1.7.8           cli_1.0.0             yaml_2.1.16          
#> [19] parallel_3.4.2        haven_1.1.0           bindrcpp_0.2         
#> [22] xml2_1.1.9000         httr_1.3.1            knitr_1.17.20        
#> [25] hms_0.4.0             rprojroot_1.2         grid_3.4.2           
#> [28] tidyselect_0.2.3      glue_1.2.0.9000       R6_2.2.2             
#> [31] readxl_1.0.0          foreign_0.8-69        rmarkdown_1.8        
#> [34] modelr_0.1.1          reshape2_1.4.3        magrittr_1.5         
#> [37] backports_1.1.2       scales_0.5.0.9000     htmltools_0.3.6      
#> [40] rvest_0.3.2.9000      assertthat_0.2.0.9000 mnormt_1.5-5         
#> [43] colorspace_1.3-2      stringi_1.1.6         lazyeval_0.2.1.9000  
#> [46] munsell_0.4.3         broom_0.4.3           crayon_1.3.4

Created on 2017-12-15 by the reprex package (v0.1.1.9000).

The non-core members of the tidyverse are installed by

install.packages("tidyverse")

but are not attached. Though the "membership" has changed, you can read here for what makes tidyverse packages part of a coherent system.

Packages, such as cli (since you mentioned it), tidyselect, and crayon are part of what's called r-lib. While the tidyverse consists of highly-opinionated tools for data science; r-lib contains mostly-unopinionated infrastructure tools. Their specific uses differ, but, for the most part, r-lib packages are not something the average R user will need. They're more for development, testing, and are utilized by the tidyverse packages (e.g. tidyselect provides backend of functions like dplyr::select() or dplyr::pull() as well as several tidyr verbs) and other activities more akin to programming with R.

4 Likes

Thanks for the useful infomation! Especially, the role of r-lib is what I didn't know.

But, still I don't get the meaning of "core". I know core packages are attached, but why are they so special? What is the differences between the cores and non-cores?

For example, stringr and forcats moved to core on version 1.2.0. Why? Is this because the package got matured enough? If so, what criteria did they achieve? Since, as you said, the membership changes as time goes by, I'm very curious about the rule behind the tidyverse...:slight_smile:

1 Like

@mara will have the inside track on the decisions made by the Rstudio team, but from a user's perspective adding stringr and forecats to core tidyverse makes sense to me because I wind up needing to load them most of the time anyway.

As for why the other core packages are "core" tidyverse, I think it's because they each contribute to an essential activity in data exploration. Below if from http://r4ds.had.co.nz/explore-intro.html.

1 Like

I said the membership has changed :wink:! The tidyverse package/wrapper has only been around for about a year (see its CRAN package archive). I don't think the plan is for it to be changing continually. The core packages are the ones people use most for the basic pipeline, as @ryanthomas pointed to below. I don't know of a hard and fast rule, I don't think there is one (@hadley?). Given its an admittedly opinionated set of tools, I'd say that the why is part of that opinion (which isn't to say it's arbitrary, but there's no case_when() or if_else() rule to write out fot it).

The core packages are the ones people use most for the basic pipeline

Ah, I got it, thanks @mara and @ryanthomas! I just assumed wrongly that there is some "hard and fast rule." Sorry for my silly question...

1 Like

We moved stringr and forcats into core because they provide a bunch of functions that you use all the time (because you almost always have strings and factors to work with).

The real question is why isn't lubridate in core? The answer is that it currently conflicts (in a narrow sense) with too many base functions - in practice the lubridate functions extend or wrap the base alternatives so the conflict aren't harmful in practice, but I haven't had time to think through how this should be indicated in the tidyverse loading screen.

4 Likes

Thanks, I'm getting the feeling that this question is exactly what I was wondering about! Hope someone will invent the brilliant loading screen.