Hi, after a recent posit conference I was wondering if there is a way to incorporate splits that are both spatially and temporally explicit to have cross-validation accounting for both simultaneously. Spatial datasets that are simultaneously a timeseries really need that feature, but it does not seem like there is an integrated way to do so.
This book covers the use of mlr3 package that does that. I was wondering if wrapping something like this could be possible?
See a block I found of interest below:
library(mlr3)
library(mlr3spatiotempcv)
task_st = tsk("cookfarm_mlr3")
task_st$set_col_roles("SOURCEID", roles = "space")
task_st$set_col_roles("Date", roles = "time")
resampling = rsmp("sptcv_cstf", folds = 5)
Presently, I have using either one of these:
# Split into train and test datasets - can be temporal OR spatial
set.seed(42)
presence_split <- initial_time_split(
presence_df %>% arrange(date),
prop = 0.75)
# If former, create time-series CV with controlled overlapping and lag-aware splits
temporal_folds <- time_series_cv(
presence_train,
date_var = date,
initial = "3 years",
assess = "1 year",
skip = "1 year",
cumulative = TRUE,
slice_limit = 10)
# Alternatively, perform a spatial CV - perhaps on the same time-split dataset? this has been my current approach, but I am not very confident in that.
presence_train_sf <- presence_train %>%
st_as_sf(coords = c("lon", "lat"), crs = 26915)
# sf::st_transform(crs = 26915)
spatial_folds <- cv_spatial(
presence_train_sf, k = 10,
selection = "systematic")
I would appreciate any insight you might have into this. Thanks!