Hi,
I have a data frame which i wish to split using the rsample::initial_time_split
. I was wondering if someone could help with the implementation below. I have created a data set where a date can appear more than once. The data set is made a sub sample of my original data so i am unable to give you the actual table due to privacy reasons.
There appears to be an overlap date wise between the train and test set where the dates are visible in both extracts. I would like a clean cut (if possible).
library(rsample)
library(dplyr)
# Split the Data based on time slices
test <- mydf %>%
sample_frac(0.01) %>%
mutate(date = custom_date) %>%
arrange(date)
uv_lag_split <- initial_time_split(test)
train_data <- training(uv_lag_split)
test_data <- testing(uv_lag_split)
c(max(train_data$date), min(test_data$date))
# [1] "2021-12-28" "2021-12-28"
unique(train_data$date) %>% tail()
# [1] "2021-12-23" "2021-12-24" "2021-12-25" "2021-12-26"
# [5] "2021-12-27" "2021-12-28"
unique(test_data$date) %>% head()
# [1] "2021-12-28" "2021-12-29" "2021-12-30" "2021-12-31"
# [5] "2022-01-01" "2022-01-02"
Thank you very much for your time