Piping with a vector. Good idea?

boshek · October 23, 2017, 9:26pm

Hi there,

This question may fall into the "why would I do this category" but I was hoping that I could at least get some feedback why the following is a bad idea. Or may it isn't.

I am building a function that always draws upon the same SQLite database. The data source is always the same. What varies is a vector supplied to that function that create the desired subset of data. The advantage of piping this vector to the function is that it wraps everything up nicely in a pipe and also takes advantage of dbplyr's laziness when it queries the data. So my question is about using that vector to pipe to another function rather than pipe data frames and what types of problems that might create. Take this simple example below (using the nycflight13) data:

## Simple function
func1 <- function(carrier_code, data = nycflights13::flights){
  
  flights_sub <- dplyr::filter(data, carrier == carrier_code) # filter
  flights_sub <- dplyr::group_by(flights_sub, carrier) # group by vector
  dplyr::summarise(flights_sub, avg_dep_delay = mean(dep_delay, na.rm = TRUE)) # some manipulations
  
}

Now imagine if the first two lines were much more involved using some sf joins all to arrive at a vector of carriers. Then I pull out the vector of interest which is then piped to my trivial function func1:

airlines %>%
  filter(carrier %in% c("AA","AS")) %>%
  pull(carrier) %>% ## left with a vector pipes to func1
  func1()

Something about this just feels wrong or against the spirit of tidytools but I wanted to check here. Is there a case for or against piping vectors instead of data frames in the context of creating tidytools?

Any input is much appreciated.

Sam

rensa · October 23, 2017, 10:37pm

I mean, if it works, it works, right?

That said, if you plan to continue operating on the data after it returns from func1, you might wanna either pipe it into as_data_frame to continue the pipe, or rewrite func1 to instead return a data frame—in which case, you might wanna go the whole hog and look into making your functions work like tidyverse verbs (although, TBH, I'm still working up to this!).

edgararuiz · October 24, 2017, 12:08am

Hi, I think your approach makes sense. I actually did something similar in an demo I worked on, I selected the top 5 airports (based on number of flights) and then piped the resulting vector as a filter to get some stats about those airports. I didn't create a function for it, but I can see where that would be useful if you have to run the same code on a regular basis.