I am new to the community so I am not sure if my topic is placed correctly here, but surely you will let me know. My question ist quite simple I guess:
I want to create a custom function that requires the user to first define the data-source (i.e. a tibble) the function is supposed to operate on so that in the subsequent arguments I will only have to name the variable names without having to adress the entire data frame, just as it is the case in classical ggplot2 functions for example:
ggplot(data = my_data,....variable argumentes)
I tried this, but it does not seem to work:
a and b are variables within my_data_frame
my_data_frame is a random objects such as a tibble*
alpha <- function(data = x, a, b) {
z = a*b
print(z)
}
alpha(data = my_data_frame, a,b)
I hope it is clear what I mean. I am looking forward to your help.
This topic is actually one of the more advanced aspects of R and programming such a function is a bit complex. The underlying principle uses something called data masking. This topic and the more broad topic of metaprogramming are extensively covered in chapters 17-21 of Hadley's Advanced R Book. I would highly recommend reading those chapters to learn more as this topic is too complicated to explain in a couple of lines. An example would look like the following:
suppressMessages(library(dplyr)) # to get starwars dataset
starwars
#> # A tibble: 87 × 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
#> 4 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 5 Leia Or… 150 49 brown light brown 19 fema… femin…
#> 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
#> 8 R5-D4 97 32 <NA> white, red red NA none mascu…
#> 9 Biggs D… 183 84 black light brown 24 male mascu…
#> 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
#> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
bmi <- function(data, height, weight) {
height <- rlang::enquo(height)
weight <- rlang::enquo(weight)
rlang::eval_tidy(weight, data) / (rlang::eval_tidy(height, data) / 100)^2
}
bmi(starwars, height, mass)
#> [1] 26.02758 26.89232 34.72222 33.33007 21.77778 37.87401 27.54821
#> [8] 34.00999 25.08286 23.24598 23.76641 NA 21.54509 24.69136
#> [15] 24.72518 443.42857 26.64360 33.95062 39.02663 25.95156 23.35095
#> [22] 35.00000 31.30194 25.21625 25.79592 25.61728 NA NA
#> [29] 25.82645 26.56250 23.89326 24.67038 NA 17.18034 16.34247
#> [36] NA NA NA 31.88776 NA NA 26.12245
#> [43] NA 17.35892 50.92802 NA 24.46460 23.76641 20.91623
#> [50] 22.64681 NA 14.76843 NA NA 22.63468 NA
#> [57] 24.83565 NA NA 23.88844 19.44637 18.14487 NA
#> [64] 21.47709 NA 23.58984 19.48696 26.01775 16.78076 NA
#> [71] NA 24.03461 NA 12.88625 NA 17.99015 34.07922
#> [78] 24.83746 22.35174 15.14960 18.85192 NA NA NA
#> [85] NA NA 16.52893
Thankfully, depending on what you want to do, there may be an easier way to accomplish this using base::with():
starwars$height + starwars$mass
#> [1] 249.0 242.0 128.0 338.0 199.0 298.0 240.0 129.0 267.0 259.0
#> [11] 272.0 NA 340.0 260.0 247.0 1533.0 247.0 290.0 83.0 245.0
#> [21] 261.2 340.0 303.0 256.0 254.0 263.0 NA NA 108.0 228.0
#> [31] 282.0 281.0 NA 262.0 306.0 NA NA NA 152.0 NA
#> [41] NA 255.0 NA 233.0 139.0 NA 228.0 272.0 280.0 283.0
#> [51] NA 234.0 NA NA 268.0 NA 270.0 NA NA 263.0
#> [61] 226.2 216.0 NA 273.0 NA 262.0 223.0 300.0 317.0 NA
#> [71] NA 94.0 NA 241.0 NA 235.0 375.0 370.0 267.0 226.0
#> [81] 286.0 NA NA NA NA NA 210.0
with(starwars, height + mass)
#> [1] 249.0 242.0 128.0 338.0 199.0 298.0 240.0 129.0 267.0 259.0
#> [11] 272.0 NA 340.0 260.0 247.0 1533.0 247.0 290.0 83.0 245.0
#> [21] 261.2 340.0 303.0 256.0 254.0 263.0 NA NA 108.0 228.0
#> [31] 282.0 281.0 NA 262.0 306.0 NA NA NA 152.0 NA
#> [41] NA 255.0 NA 233.0 139.0 NA 228.0 272.0 280.0 283.0
#> [51] NA 234.0 NA NA 268.0 NA 270.0 NA NA 263.0
#> [61] 226.2 216.0 NA 273.0 NA 262.0 223.0 300.0 317.0 NA
#> [71] NA 94.0 NA 241.0 NA 235.0 375.0 370.0 267.0 226.0
#> [81] 286.0 NA NA NA NA NA 210.0