I'm working on an R package (https://github.com/bradleyboehmke/completejourney) that provides access to real world retail transaction data (a total of 8 data sets) that have been used to train data scientists at my company and also in a few universities.
A few of the data sets are too large to reside inside the R package (CRAN won't accept pkgs over 25MB). Consequently, we took a different route by providing a function (get_data()
) that will download one or more of the data sets from GitHub. Rather than save the downloaded data sets as a list of tibbles, we have get_data()
saving each data set as a tibble in the users global environment.
We could not find anything in the official R documentation that states you cannot do this and in our help documentation we clearly state that get_data()
will save the data set in the global environment. However, during the submission process, we had two initial submissions where the CRAN reviewer did not have any concerns with this but on the third submission a different CRAN reviewer raised concern and stated "Please do not modify the .GlobalEnv and just return a list of loaded objecs." A few questions:
- Is this considered bad practice (loading data to the users global environment)?
- Is it worth pushing back on this third CRAN reviewer?
- If pushing back on the CRAN reviewer is not an option, I'm looking for alternatives to downloading 8 separate data sets as a list. This then requires the users to parse out the individual data frames separate if they desire. The obvious alternative is to have them download each data set individually, which is tedious and, since this package is heavily used for educational purposes, many of the folks using it may not be educated on functional programming options (i.e. lapply, purrr) to simplify. Rather, I'm trying to make it as convenient for them as possible.