Sometimes I feel I use a Tidyverse approach, but not the right one or perhaps a non Tidyverse process altogether is better. Here is an example of such a situation:
I have some standard data, ie. mpg and cyl from mtcars. I also have some label summary statistics that says Bad, Medium or good for a level of mpg for cars with certain cyl.
Note: In the example pretend that the summary labels came from elsewhere. I'm not interested in calculating summary stats in order to label the data.
The data:
library(dplyr)
library(tibble)
library(purrr)
cars <- rownames_to_column(mtcars[1:2]) %>% as_data_frame()
mpg_label <- data_frame(
cyl_l = rep(c(4, 6, 8), each = 3),
label_l = rep(c("bad", "medium", "good"), 3),
mpg_l = c(25, 28, Inf, 18, 20, Inf, 15, 18, Inf)
)
Now I want to apply those labels to my data, where cyl is cyl_l and is below a level and the previous. Not to difficult.
If you have a minute at this point stop reading and code how you'd do that task, and important also write your thought process!
Here is mine:
My brain goes in 'some' Tidyverse mode:
- use mutate to create the new label column in cars.
- map over the two variables that I need (cyl and mpg)
- Use that cyl and mpg mapping to find your label.
cars %>%
mutate(label = map2_chr(cyl, mpg,
function(c, m) {
mpg_lbl <- mpg_label %>%
filter(cyl_l == c,
mpg_l > m) %>%
slice(1) %>%
select(label) %>%
`[[`(1)
}))
It works fine, however I have a gut feeling there are better approaches.
- What if the data gets very big? Is it still great to apply filter to each. Isn't there much faster matrix variant? Or something else? What is the thought process behind that?
- Is there another tidyverse approach?
- Naturally, I believe I should be looking for a join. Join by cylinder and mpg where mpg is 'the smallest of values greater than". But I'm not sure how to do that with dplyr joins. So thought process should sound like: "I need to join two tables, join table 1 on/with/anti table 2, join by x and y where z".
- I was looking for a group_by(cyl) approach but wasn't sure how to get rid of the map2.
As I mentioned I'd be very interested into the thought process that you have before you write any code. I'd love to apply that to other examples as well.
Best,
Jiddu