Your code had a lot of duplication, and loaded many unnecessary packages. I've re-written it to make it simpler and clearer.
# Load the packages
library(fpp3)
# Get data
train_data <- readxl::read_excel("forecast_data.xlsx", sheet = "train_data")
test_data <- readxl::read_excel("forecast_data.xlsx", sheet = "test_data")
public_holiday <- readxl::read_excel("forecast_data.xlsx", sheet = "regressor")
complete_data <- bind_rows(train_data, test_data) |>
inner_join(public_holiday, by = "business_date") |>
mutate(business_date = as.Date(business_date))
# Existing Stores
existing_stores <- complete_data |>
filter(business_date == max(business_date)) |>
pull(store_code) |>
unique()
# Convert to tsibble
complete_data <- complete_data |>
filter(store_code %in% existing_stores) |>
as_tsibble(index = business_date, key = store_code) |>
fill_gaps() |>
fill(gross_sales, .direction = "down")
# Split back into train and test data
train_data <- complete_data |>
filter(business_date <= as.Date(max(train_data$business_date)))
test_data <- complete_data |>
filter(business_date > as.Date(max(train_data$business_date)))
# Fit a dynamic regression model
fit <- train_data |>
model(
dynreg = ARIMA(gross_sales ~ holiday_flag + promotion_flag),
ets = ETS(gross_sales)
) |>
# Add combination of the two models
mutate(
comb = (dynreg + ets)/2
)
# Produce forecasts
fc <- fit |>
forecast(new_data = test_data)
# Compare model accuracy
accuracy(fc, complete_data) |>
group_by(.model) |>
summarise(
RMSE = sqrt(mean(RMSE^2)),
MAE = mean(MAE),
RMSSE = sqrt(mean(RMSSE^2)),
MASE = mean(MASE)
) |>
arrange(RMSSE)
Note that I also included an ETS model, and a combination (or ensemble) of the dynamic regression and ETS model. While the ETS model is not particularly good, and doesn't take account of the covariates, the combination performs slightly better than the dynamic regression model alone.
You were fitting two dynamic regression models. One where you specified no seasonality, and one where you allowed seasonality. As the data does appear to be seasonal, the first model is not particularly good. The second model worked well.
The reason for the two warnings, were that it struggled to fit models to two of the series, and the resulting standard errors were missing. Possible this was due to having too many missing values at the same time as promotions or public holidays, making it hard to estimate the corresponding coefficients. Or it may have been due to some weird outliers, making the model difficult to estimate. I have not investigated this issue further, and I don't know which two series were causing the problems. You can check which series are producing the worst forecasts using
# Compare model accuracy
accuracy(fc, complete_data) |>
group_by(store_code) |>
summarise(
RMSE = sqrt(mean(RMSE^2)),
MAE = mean(MAE),
RMSSE = sqrt(mean(RMSSE^2)),
MASE = mean(MASE)
) |>
arrange(desc(RMSSE))
Then try looking at the data for the worst 2 series to see if there are issues there.
You could also consider using a hierarchical forecasting approach -- by looking at aggregate sales across all stores, and then reconciling the forecasts of the individual stores with the forecasts of the aggregates. See Chapter 11 Forecasting hierarchical and grouped time series | Forecasting: Principles and Practice (3rd ed) for how to do this.