Why do we need a copy of the data for the return value of initial_split()?

cantor · January 26, 2024, 7:11am

Why do we need a copy of the data for the return value of initial_split()?The way I think of it, just returning the index is fine. Is there any other purpose for returning a copy of the data?

library(tidymodels)
library(readr)

hotels <-
read_csv("https://tidymodels.org/start/case-study/hotels.csv") %>%
mutate(across(where(is.character), as.factor))

set.seed(123)
splits <- initial_split(hotels, strata = children)

nirgrahamuk · January 26, 2024, 10:55am

initial_split's purpose is to facilitate training() and testing() functions like in the example

set.seed(1353)
car_split <- initial_split(mtcars)
train_data <- training(car_split)
test_data <- testing(car_split)

If you don't like the API thats fine, you can do otherwise, but thats the API

for what its worth, because of Rs copy-on-write approach, initial split will not tie up significant memory, unless it or the original data are altered. However ; the training and testing data sets creations will perform a copy action, as they are modifications of the initial data.

system · February 16, 2024, 10:55am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.