Overview
I am conducting a time series analysis using an ARIMA model, and I would like to visualise the Mean Absolute Percentage Error (MAPE) in a plot (see below)
I am currently following this tutorial:
Issues

I have successfully run the Rcode; however, the value produced by the object MAPE (see Rcode below) is not a number, showing an Arima Holdout MAPE value of NaN %, which is definitely incorrect:
MAPE 1 NaN

The numerical range of the yaxis on the ARIMA MAPE plot (see below) is spurious, ranging from zero to 10,0000 i.e., 0e+00 to 1e+5 (which is obviously incorrect), and, the date range on the xaxis is wrong because the date range for my data is 20152017. As a consequence, I have inputted the date range of 20152017 into both ts() and window() functions. The existence of this spurious data range is also responsible for the strange spike exhibited by the fitted data in the ARIMA MAPE plot below.

Moreover, the column named 'Date' produced by the object d4 (see Rcode below) exhibits the year 1970 (see below), which does not make sense considering this is not the date range I inputted into my models:
#These values are produced by the object d4 Fitted Actual Date 1 3.592659e+01 36 19700102 2 2.798055e+01 28 19700103 3 3.897144e+01 39 19700104 4 4.596831e+01 46 19700105 5 5.008207e+00 5 19700106 6 1.009074e03 0 19700107 7 1.007765e03 0 19700108 8 2.194326e+01 22 19700109 9 9.985120e+00 10 19700110 10 1.497379e+01 15 19700111 11 7.992608e+00 8 19700112 12 3.292665e+01 33 19700113 13 3.320180e+01 33 19700114 14 2.568558e+01 29 19700115
The dates on the xaxis should follow the same format as the column called 'Year (below)', which is produced when you run the BSTS_new_df object.
Year Month Frequency
1 20150101 January 36
2 20150201 February 28
3 20150301 March 39
4 20150401 April 46
5 20150501 May 5
If anyone can please help me solve these issues, I would like to express my deepest appreciation.
Many thanks in advance.
Rcode
#Set seed
seed(45L)
##Open packages for the time series analysis
library(lubridate)
library(bsts)
library(dplyr)
library(ggplot2)
library(ggfortify)
##Change the Year column into YY/MM/DD format for the first of every month per year
BSTS_df$Year < lubridate::ymd(paste0(BSTS_df$Year, BSTS_df$Month,"01"))
##Order the Year column in YY/MM/DD format into the correct sequence: JanDec
allDates < seq.Date(
min(BSTS_df$Year),
max(BSTS_df$Year),
"month")
##Produce and arrange the new data frame ordered by the first of evey month in YY/MM/DD format
BSTS_new_df < BSTS_df %>%
right_join(data.frame(Year = allDates), by = c("Year")) %>%
dplyr::arrange(Year) %>%
tidyr::fill(Frequency, .direction = "down")
##Create a time series object
myts_Arima < ts(BSTS_new_df$Frequency, start=c(2015, 1), end=c(2017, 12), frequency=12)
##Upload the data into the windows() function
Arima_Window < window(myts_Arima, start=c(2015, 01), end=c(2017, 12))
### Fit the ARIMA model
arima < arima(log10(x+.001),
order=c(0, 1, 1),
seasonal=list(order=c(0,1,1), period=12))
### Actual versus predicted
d4 < data.frame(c(10^as.numeric(fitted(arima)), # fitted and predicted
10^as.numeric(predict(arima, n.ahead = 12)$pred)),
as.numeric(BSTS_df$Frequency), #actual values
as.Date(time(BSTS_df$Year)))
##Name the columns
names(d4) < c("Fitted", "Actual", "Date")
### MAPE (mean absolute percentage error)
MAPE < dplyr::filter(d4, lubridate::year(Date)>2017) %>%
dplyr::summarise(MAPE=mean(abs(ActualFitted)/Actual))
##Open a plotting window
dev.new()
### Plot actual versus predicted
ggplot(data=d4, aes(x=Date)) +
geom_line(aes(y=Actual, colour = "Actual"), size=1.2) +
geom_line(aes(y=Fitted, colour = "Fitted"), size=1.2, linetype=2) +
theme_bw() + theme(legend.title = element_blank()) +
ylab("") + xlab("")
geom_vline(xintercept=as.numeric(as.Date("20151201")), linetype=2) +
ggtitle(paste0("ARIMA  Holdout MAPE = ", round(100*MAPE,2), "%")) +
theme(axis.text.x=element_text(angle = 90, hjust = 0))
ARIMA MAPE Plot
Data frame  BSTS_df
structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017), Month = structure(c(5L,
4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L,
9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L,
2L, 12L, 11L, 10L, 3L), .Label = c("April", "August", "December",
"February", "January", "July", "June", "March", "May", "November",
"October", "September"), class = "factor"), Frequency = c(36,
28, 39, 46, 5, 0, 0, 22, 10, 15, 8, 33, 33, 29, 31, 23, 8, 9,
7, 40, 41, 41, 30, 30, 44, 37, 41, 42, 20, 0, 7, 27, 35, 27,
43, 38)), row.names = c(NA, 36L), class = "data.frame")