Initial training data in time series cross-validation

JohnKar · February 17, 2021, 9:24am

My question relates to the forecasting performance of 1- to many-step-ahead forecasts. An example from the FPP3 textbook section on time series cross-validation is shown below (with minimal edits for compactness). The example relates to 1- to 8-step-ahead drift forecasts.

library(fpp3)
google_2015 <- gafa_stock %>%
  filter(Symbol == "GOOG", year(Date) >= 2015) %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE) %>%
  filter(year(Date) == 2015)
google_2015_tr <- google_2015 %>%
  stretch_tsibble(.init = 3, .step = 1)
fc <- google_2015_tr %>%
  model(RW(Close ~ drift())) %>%
  forecast(h = 8) %>%
  group_by(.id) %>%
  mutate(h = row_number()) %>%
  ungroup()
fc %>%
  accuracy(google_2015) %>%
  select(.model,.type,RMSE,MAE,MAPE,MASE)

However, the number of observations in google_2015 is 252. Shouldn't we be using .init=51 to ensure we use at least 20% of the total number of observations as training data?

^{Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos}

system · March 10, 2021, 9:24am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.