Picking up a time-point and its observation based on a certain date

I have a dataset with 3 time-point date (date1,date2,date3) and I have another reference date (ref_date). Each of this 3 time-point date has two observation value1:value3 and dummy1:dummy3.

data <- data.frame(ref_date = c("2022-03-05",     
                            "2020-10-13",
                            "2025-10-15",
                            "2019-11-22",
                            "2022-12-15",
                            "2021-04-18"),
                   date1 = c("2022-01-05",    
                            "2020-03-13",
                            "2025-07-15",
                            "2019-11-22",
                            "2022-04-15",
                            "2021-12-08"),
                   values1 = 1:6,
                   dummy1 = 45:50,
                   date2 = c("2022-02-05",      
                            "2020-04-13",
                            "2025-06-15",
                            "2019-12-22",
                            "2022-10-15",
                            "2021-04-08"),
                   values2 = 2:7,
                   dummy2 = 60:65,
                   date3 = c("2022-04-05",      
                            "2020-05-13",
                            "2025-08-15",
                            "2019-12-28",
                            "2022-11-15",
                            "2021-04-19"),
                   values3 = 10:15,
                   dummy3 = 70:75)

Now I need to use the ref_date as the reference date and select the date and its observations among the three time-points (date1:date3) that is its closest preceding date. In real data the number of observations for each timepoint is ~ 20 NOT 2 so I need to capture them. The resulting data should look like this. Note the three variables point_date, point_value and point_dummy highlighted in green which is the output I am looking for. Thanks a lot for the help !

my solution

library(tidyverse)
d1 <- data.frame(ref_date = c("2022-03-05",     
                                "2020-10-13",
                                "2025-10-15",
                                "2019-11-22",
                                "2022-12-15",
                                "2021-04-18"),
                   date1 = c("2022-01-05",    
                             "2020-03-13",
                             "2025-07-15",
                             "2019-11-22",
                             "2022-04-15",
                             "2021-12-08"),
                   values1 = 1:6,
                   dummy1 = 45:50,
                   date2 = c("2022-02-05",      
                             "2020-04-13",
                             "2025-06-15",
                             "2019-12-22",
                             "2022-10-15",
                             "2021-04-08"),
                   values2 = 2:7,
                   dummy2 = 60:65,
                   date3 = c("2022-04-05",      
                             "2020-05-13",
                             "2025-08-15",
                             "2019-12-28",
                             "2022-11-15",
                             "2021-04-19"),
                   values3 = 10:15,
                   dummy3 = 70:75)

(d2 <- d1 |> mutate(across(contains("date"),as.Date)) |> rowwise() |> mutate(dlist=list(c(date1,date2,date3)),
                                 dlist2 = list(if_else(ref_date>=dlist,dlist,as.Date("0000-01-01"))),
                                 dchoice = which.min(ref_date-dlist2)) |> select(-dlist,-dlist2))

(d3 <- d2 |> mutate(point_date=(cur_data()[[paste0("date",dchoice)]]),
                    point_value=(cur_data()[[paste0("values",dchoice)]]),
                    point_dummy=(cur_data()[[paste0("dummy",dchoice)]])) |> ungroup())
1 Like

Thank you @nirgrahamuk . This worked perfectly.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.