I'm trying to write a function that performs a join of two user-supplied data sets. I don't know for sure what the by-variables will be or even how many the user will want.
I do know that in both data sets the variables to be joined on will have the same names. However, I can't just have my function do a natural join because there could be other variables that would end up being joined on as well.
I've seen online how to use {{
to include a captured name in join_by
, but I haven't seen anywhere how to do this if the user had multiple by-variables.
There are some examples online of passing variable names through the dots, capturing those, and then using the old by=
syntax to specify the join variables. But in my case I'm using the dots to pass arguments to another function.
Here's a short example showing my issue:
# Note: dots are used to forward extra arguments on to anti_join
my_join = function(dt1, dt2, join_vars, ...){
dplyr::anti_join(dt1, dt2, by = dplyr::join_by({{join_vars}}), ...)
}
dt1 = tibble::as_tibble(mtcars, rownames = "model")
dt2 = dplyr::filter(dt1, stringr::str_detect(model, "^Merc"))
# specifying a single column works fine
my_join(dt1, dt2, join_vars = model)
#> # A tibble: 25 × 12
#> model mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 Mazda RX4 … 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 Hornet 4 D… 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 Hornet Spo… 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 Cadillac F… 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#> 9 Lincoln Co… 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#> 10 Chrysler I… 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#> # ℹ 15 more rows
# specifying multiple columns doesn't work at all
my_join(dt1, dt2, join_vars = c(model, carb))
#> Error in `dplyr::join_by()`:
#> ! Expressions must use one of: `==`, `>=`, `>`, `<=`, `<`, `closest()`,
#> `between()`, `overlaps()`, or `within()`.
#> ℹ Expression 1 is `c(model, carb)`.
Created on 2024-04-11 with reprex v2.1.0