I'd like to split a data set in order to obtain a train and test set. The slice_sample function helps me to split by n or prop and take into account groups, which is great. Then, anti_join helps me to get the other half of the data, given the sliced data. However, anti_join removes all identical rows, such that if there are duplicates in the data, it might remove all of those, rather then only the sliced ones.
set.seed(42)
#a possible tidy way but
myiris <- iris %>% mutate(rn=row_number())
data_test <- myiris %>% slice_sample(n = 40)
data_train <-myiris %>% slice(-pull(data_test,rn))
#not sure its much better than the base way
test_index <- sample.int(n = nrow(iris),
size = 40)
dtest <- iris[test_index,]
dtrain <- iris[-test_index,]