I am trying to determine whether a given latitude/longitude (lat/long) lands in a geofenced area (these 2 pieces of information, at the moment, are contained within 2 separate dataframes. The lat/long data is contained below in a dataframe called checks
. The geofenced areas are contained in a dataframe called container
.
library(dplyr)
library(purrr)
library(stringr)
checks <- structure(list(name = c("a", "b", "c"), latitude = c(43.0988803401934,
42.1276251670733, 40.2629180055808), longitude = c(74.5229195309954,
75.6848258586636, 73.2005188397627)), row.names = c(NA, -3L), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
container <- structure(list(id = 1:3, name = c("location_1", "location_2",
"location_3"), lat = c(40.7130015437452, 40.8207479655434, 42.6486260163842
), long = c(74.1955991671793, 75.0798275814, 74.8702938787559
), box_lat_north = c(41.7630015437452, 41.8707479655434, 43.6986260163842
), box_lat_south = c(39.6630015437452, 39.7707479655434, 41.5986260163842
), box_long_west = c(75.2455991671793, 76.1298275814, 75.9202938787559
), box_long_east = c(73.1455991671793, 74.0298275814, 73.8202938787559
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
For each row in checks
, I want to know whether the lat/long appears within any of the geofenced areas as specified in container
. If they do, I want to log this information. At the moment, here is how I am approaching this problem:
single_lookup <- function(the_name, the_lat, the_long, id){
print(id)
one_row <- container %>%
filter(str_detect(name, the_name), lat == the_lat, long == the_long)
found <- checks %>%
filter(latitude < one_row$box_lat_north, latitude > one_row$box_lat_south,
longitude < one_row$box_long_west, longitude > one_row$box_long_east) %>%
mutate(hot_zone = the_name) %>%
select(name, hot_zone)
}
results <- container %>%
mutate(res = pmap(list(name, lat, long, id), single_lookup))
results
tabled <- map_dfr(results$res, bind_rows)
tabled
This solution works, but it is rather slow when running this function on millions of observations in both checks
and container
. I would like to speed up this process, if possible.
One idea is to use furrr
to run this analysis in parallel. Is this among the best options? Is it possible to vectorize these operations using list columns or otherwise? I am open to any and all suggestions!
Thanks in advance for your time and help!
Hope