Hi,is there way to name the timeseries dataset into training and test data by 80 and 20 percent. For example
df <- structure(list(Date = structure(c(3L, 4L, 5L, 6L, 1L, 2L), .Label = c("1/9/2019",
"2/9/2019", "27/8/2019", "28/8/2019", "29/8/2019", "30/8/2019"
), class = "factor"), Val = c(1, 2, 3, 4, 5, 6)), class = "data.frame", row.names = c(NA,
-6L))
df
Date Val Split
27/8/2019 2 Training
28/8/2019 2 Training
29/8/2019 4 Training
30/8/2019 4 Training
28/9/2019 8 Test
29/9/2019 9 Test
One possible way:
dataset <- data.frame(Date = c("27/8/2019", "28/8/2019", "29/8/2019", "30/8/2019", "1/9/2019", "2/9/2019"),
Val = c(1, 2, 3, 4, 5, 6))
dataset$Split <- rep(x = c("Training", "Test"),
times = c(floor(x = 0.8 * nrow(x = dataset)), ceiling(x = 0.2 * nrow(x = dataset))))
dataset
#> Date Val Split
#> 1 27/8/2019 1 Training
#> 2 28/8/2019 2 Training
#> 3 29/8/2019 3 Training
#> 4 30/8/2019 4 Training
#> 5 1/9/2019 5 Test
#> 6 2/9/2019 6 Test
Created on 2019-10-15 by the reprex package (v0.3.0)
And, then if you want to split it, you can do this (among many other possibilities):
zeallot::`%<-%`(x = c(test, train),
value = split(x = dataset,
f = dataset$Split))
However, if splitting is your objective, and not creating a new column, then probably this is better:
train_indices <- seq_len(length.out = floor(x = 0.8 * nrow(x = dataset)))
train <- dataset[train_indices,]
test <- dataset[-train_indices,]
Hope this helps.
Short feedback on your code
If you provide code to create your data and then show us the output of what you want and not what you have, it's confusing.
For small datasets, and especially if they are data.frame
, dput
is a bit too much. Perhaps directly creating as I did in my code above makes it more readable.
2 Likes
system
Closed
October 22, 2019, 11:27am
4
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.