Greetings RStudio Community:
I have a data frame of x and y coordinates representing baseball pitch locations (df_2
). I also have a reference data frame containing a region
label as well as the corresponding xmin
, xmax
, ymin
, and ymax
region parameters (df_1
). I'm trying to apply the value of df_1$region
to df_2$region
when df_2$x
and df_2$y
are between df_1$xmin
& df_1$xmax
AND df_1$ymin
& df_1$ymax
.
I can get the code to run using a nasty series of nested ifelse statements, but ideally the solution would be much faster and more elegant. I’ve tried using purrr and a for loop to no avail.
# Objective:
# Match x and y in df_2 with corresponding region number in df_1
library(tidyverse)
# df_1: region labels and coordinates
load(url("http://aaronbaggett.com/data/df_1.Rda"))
# df_2: x and y coordinates
load(url("http://aaronbaggett.com/data/df_2.Rda"))
# Attempt 1: Using purrr
df_2 %>%
mutate(region = map2_dbl(x, y,
~df_1$region[.x >= df_1$xmin &
.x <= df_1$xmax &
.y >= df_1$ymin &
.y <= df_1$ymax]))
#> Error in mutate_impl(.data, dots): Evaluation error: Result 52 is not a length 1 atomic vector.
df_2[52, ]
#> # A tibble: 1 x 3
#> region x y
#> <dbl> <dbl> <dbl>
#> 1 0 -0.0200 1.83
One potential problem with the df_1
region parameters is that when a pitch is directly over one of the borders (see blue lines in the figure below), the function isn't sure to which region
those pitch coordinates should be assigned. For example, df_2[52, ]
could be in either region 27 or 21. The output snippet below is what df_2
should look like after the iteration.
df_2
#> # A tibble: 100 x 3
#> region x y
#> <dbl> <dbl> <dbl>
#> 1 25 -1.37 1.42
#> 2 28 0.405 1.21
#> 3 31 -1.37 0.682
#> 4 36 1.58 0.912
#> 5 10 0.304 3.50
#> 6 14 -0.906 3.03
#> 7 23 0.620 2.41
#> 8 9 -0.202 3.38
#> 9 14 -0.987 2.93
#> 10 8 -1.02 3.77
#> # ... with 90 more rows
Any help is appreciated.