Using get() in a function to refer to package data with a string?

mfherman · December 27, 2018, 2:03pm

I'm developing a package that primarily consists of spatial data files for various geographies in New York City (boroughs, community districts, census tracts, etc.) as well as some helper functions to retrieve and filter these objects. I have each of the geographies saved as .rda files in the /data folder of the package so I can use the objects by referring to their name, like tracts_sf.

The main function in the package lets a user specify a geography (like borough, cd, tract) and returns an sf object with those boundaries. For example nyc_boundaries(geography = "borough") will return an sf object of the borough boundaries.

I'm wondering the best way to reference the appropriate data file to return (in this case boros_sf) in my function. Currently, I'm creating strings of the data files based on the input argument and then using get() to return the correct file, but this doesn't feel like the best way to do things. Here is a simplified version of my function:

nyc_boundaries <- function(geography = c("borough", "cd", "tract"),
                           resolution = c("low", "high")) {

  geography <- match.arg(geography)
  resolution <- match.arg(resolution)

  if (geography == "borough") {
    .geo <- "boros"
  } else if (geography == "cd") {
    .geo <- "cds"
  } else {
    .geo <- "tracts"
  }

  if (resolution == "low") {
    .shp_call <- paste0(.geo, "_sf_simple")
  } else {
    .shp_call <- paste0(.geo, "_sf")
  }

  shp <- get(.shp_call)

  return(shp)

nteetor · December 27, 2018, 2:29pm

Have you considered more specific if/else statements? This would allow you to return the exact object and avoid using get().

if (resolution == "low") {
  if (geography == "borough") {
    boros_sf_simple
  } else if (geography == "cd") {
    cds_sf_simple
  } else {
    tracts_sf_simple
  }
} else {
  if (geography == "borough") {
    boros_sf
  } else if (geography == "cd") {
    cds_sf
  } else {
    tracts_sf
  }
}

If that does not sit well, you could consider 3 small helper functions,

get_boros <- function(res) {
  if (res == "high") boros_sf else boros_sf_simple
}

get_cds <- function(res) {
  if (res == "high") cds_sf else cds_sf_simple
}

get_tracts <- function(res) {
  if (res == "high") tracts_sf else tracts_sf_simple
}

Then the body of the main function can be adjusted. In this case, the nyc_boundaries function no longer has to worry about resolution by instead passing the value along to the helper functions.

if (geography == "borough") {
  get_boros(resolution)
} else if (geography == "cd") {
  get_cds(resolution) 
} else {
  get_tracts(resolution)
}

I hope this helps and best of luck with the package.

mfherman · December 27, 2018, 2:55pm

Thanks @nteetor! Your first suggestion is actually how I initially had it coded, but my real function can return more than just boroughs, cds, and tracts and it felt like there might be a better way that specifying each combination of geography and resolution. But get() feels somewhat brittle, so I might just go back to the other way.

FYI, a very alpha version of the package is here if you are interested: https://github.com/mfherman/nycgeo

nwerth · January 7, 2019, 7:12pm

I like to replace simple if-else chains with switch expressions. I've also come to appreciate "flat" code. It's easy to read and not much harder to write with a decent text editor (and R Studio is more than decent). More often, I'll use logic and string composition to write flat code, which I then copy into the R file.

Using your example:

nyc_boundaries <- function(geography = c("borough", "cd", "tract"),
                           resolution = c("low", "high")) {
  selection <- paste(geography, resolution)
  switch(selection,
    "borough low"  = boros_sf_simple,
    "borough high" = boros_sf,
    "cd low"       = cds_sf_simple,
    "cd high"      = cds_sf,
    "tract low"    = tracts_sf_simple,
    "tract high"   = tracts_sf,
  )
}

The cool thing: this respects lazy loading. nyc_boundaries("cd", "low") will only load the cds_sf_simple object into the package's namespace.

hughparsonage · January 8, 2019, 11:57am

Perhaps getExportedValue? For example,

getExportedValue("nycflights13", "airports")
#> # A tibble: 1,458 x 8
#>    faa   name                    lat    lon   alt    tz dst   tzone        
#>    <chr> <chr>                 <dbl>  <dbl> <int> <dbl> <chr> <chr>        
#>  1 04G   Lansdowne Airport      41.1  -80.6  1044    -5 A     America/New_~
#>  2 06A   Moton Field Municipa~  32.5  -85.7   264    -6 A     America/Chic~
#>  3 06C   Schaumburg Regional    42.0  -88.1   801    -6 A     America/Chic~
#>  4 06N   Randall Airport        41.4  -74.4   523    -5 A     America/New_~
#>  5 09J   Jekyll Island Airport  31.1  -81.4    11    -5 A     America/New_~
#>  6 0A9   Elizabethton Municip~  36.4  -82.2  1593    -5 A     America/New_~
#>  7 0G6   Williams County Airp~  41.5  -84.5   730    -5 A     America/New_~
#>  8 0G7   Finger Lakes Regiona~  42.9  -76.8   492    -5 A     America/New_~
#>  9 0P2   Shoestring Aviation ~  39.8  -76.6  1000    -5 U     America/New_~
#> 10 0S9   Jefferson County Intl  48.1 -123.    108    -8 A     America/Los_~
#> # ... with 1,448 more rows

^{Created on 2019-01-08 by the reprex package (v0.2.1)}

mfherman · January 9, 2019, 8:04pm

Oh nice, I like both of these options. I didn't know about getExportedValue(), and that is pretty much what I was looking for in my original question. For now, I've decided to refactor the code a bit using more if/else statements because in my actual function, I need to assign additional variables and objects depending on the geography selection, so that seems to work better for me than switch() as @nweth suggested.

Thanks for the input!