There is not a good way to do this using only filter(), because your constraint values are from another DF. There are a couple approaches to accomplish this, each I can think of requires making a lookup dataframe containing PatientId, minDate, maxDate from the first DF, joining that in, and then filtering against those values. A reproducible example of this approach is below. I am using the powerful combo of tidyr::nest() and purrr::map(). nest() splits the DF by PatientId to create a column data
which is a list of DFs for each PatientId. map() then applies filter() on each of those DFs into a new list column 'filteredDFs'. unnest() reveals the filtered dataframe from the list in filteredDFs
.
library(dplyr)
library(purrr)
library(tidyr)
# MAKE TEST DATA ----
set.seed(42)
## Dummy Dataframe of Interest
dataFrameOfInterest <- data.frame(
PatientId = rep(c(90011,9002,9003,900444,9005), 500),
Date = sample(seq.Date(as.Date('2022-01-01'),as.Date('2024-04-20'), 'day'),
2500, replace = TRUE
)
)
## Dummy Arrow Mapping
arrow_mapping <- data.frame(
PatientId = rep(c(9001,9002,9003,9004,9005), 100000),
Date = sample(seq.Date(as.Date('2020-01-01'), as.Date('2024-04-20'), 'day'),
500000, replace = TRUE
),
VOI1 = sample(LETTERS, 500000, replace = TRUE),
VOI2 = sample(letters, 500000, replace = TRUE),
VOI3 = sample(1:100000, 500000, replace = TRUE)
)
### 1) Create a lookup of unique ID and date ranges
lu_dfoi <- dataFrameOfInterest %>%
group_by(PatientId) %>%
summarize(
minDate = min(Date, na.rm = TRUE),
maxDate = max(Date, na.rm = TRUE)
)
### 2) Join Min/Max dates by PatientId, getting ID-specific date range
### Nest by PatientId, map through each dataframe in filteredDFs, unnest
output <- arrow_mapping %>%
left_join(., lu_dfoi, by = 'PatientId') %>%
tidyr::nest(data = !PatientId) %>%
mutate(
filteredDFs = purrr::map(seq_along(data), ~
filter(data[[.x]], Date >= minDate, Date <= maxDate)
)
) %>%
unnest(cols = c(filteredDFs)) %>%
select(PatientId,Date,VOI1,VOI2,VOI3)
Cheers