Overview:
I am conducting a Bayesian time series analysis with mcmc simulations. To visualise the mean absolute percentage error (MAPE) for the analysis, I am producing a plot using ggplot().
I am following this tutorial below:
Problems:
In the past, I have successfully produced this plot (see plot 1 below) with my R-code that can be found in this previously asked Stack Overflow question.. The mean holdout value was 32.52 %, and the actual and fitted data plotted nicely (plot 1).
However, I have not used this code for a couple of months, and today, I needed to extract the plot for work. To my dismay, the R-code that was previously working absolutely fine is no longer working (see plot 2). I have tried to adapt the R-code below in the data frame d2 (see R-code below) as well as checking the R-code in the plotting object itself (produced using ggplot()), and I just cannot find the answer.
For example:
- The y-axis is ranging from 0 to 15,000.00 in plot 2, whereas, the correct data range should be between 0-120 as shown in plot 1 (when the code was working).
- The MAPE value in plot 2 is Inf %, when the correct MAPE value is 32.52 % (see plot 1 - produced with the R-code in the link above to my previous Stack Overflow question).
However, the only difference between my R-code in this question and my previous question was that I needed to add .001 to the x object (see R-code below) since the bsts() function would not accept zero values for the months of June and July (see the data frame below - BSTS_df).
When I run the MAPE object, I have noticed that the outcome is infinity (see below), which is very strange:
**MAPE
1 Inf**
I have also noticed that the values in the column UL produced from the posterior.interval object is ranging up to approximately 17,500.00, which I assume is not correct (see below):-
LL UL Date
1 3.05009089 9214.58787 2017-01-01
2 2.74650830 7143.70646 2017-02-01
3 5.68858210 6537.04558 2017-03-01
4 2.76432668 2836.65981 2017-04-01
5 0.78042088 2249.52498 2017-05-01
6 0.04854900 88.57707 2017-06-01
7 0.03650557 145.12395 2017-07-01
8 2.70009631 5338.68317 2017-08-01
9 3.08234007 2329.57492 2017-09-01
10 2.26410227 17146.68785 2017-10-01
11 1.22190125 15561.66013 2017-11-01
12 2.14859047 3326.09852 2017-12-01
If anyone can help me fix these errors, I would be deeply appreciative since my R-code was working extremely well beforehand.
Many thanks in advance.
Plot 1
Plot 2
R-Code
##Open packages for the time series analysis
library(lubridate)
library(bsts)
library(dplyr)
library(ggplot2)
library(ggfortify)
###################################################################################
##Time Series Bayesian Inference Model with mcmc using the bsts() function
##################################################################################
##Change the Year column into YY/MM/DD format for the first of evey month per year
BSTS_df$Year <- lubridate::ymd(paste0(BSTS_df$Year, BSTS_df$Month,"-01"))
##Order the Year column in YY/MM/DD format into the correct sequence: Jan-Dec
allDates <- seq.Date(
min(BSTS_df$Year),
max(BSTS_df$Year),
"month")
##Produce and arrange the new data frame ordered by the first of evey month in YY/MM/DD format
BSTS_new_df <- BSTS_df %>%
right_join(data.frame(Year = allDates), by = c("Year")) %>%
dplyr::arrange(Year) %>%
tidyr::fill(Frequency, .direction = "down")
##Create a time series object
myts2 <- ts(BSTS_new_df$Frequency, start=c(2015, 1), end=c(2017, 12), frequency=12)
##Upload the data into the windows() function
x <- window(myts2, start=c(2015, 01), end=c(2016, 12))
y <- log(x+.001)
##Produce a list for the object ss
ss <- list()
ss <- AddSeasonal(ss, y, nseasons = 12)
ss <- AddLocalLevel(ss, y)
##Produce the bsts model using the bsts() function
bsts.model <- bsts(y, state.specification = ss, niter = 100, ping = 0, seed = 123)
##Open plotting window
dev.new()
##Plot the bsts.model
plot(bsts.model)
##Get a suggested number of burns
burn<-bsts::SuggestBurn(0.1, bsts.model)
##Predict
p<-predict.bsts(bsts.model, horizon = 12, burn=burn, quantiles=c(.25, .975))
p$mean
##Actual vs predicted
d2 <- data.frame(
# fitted values and predictions
c(exp(as.numeric(-colMeans(bsts.model$one.step.prediction.errors[-(1:burn),])+y)),
exp(as.numeric(p$mean))),
# actual data and dates
as.numeric(BSTS_new_df$Frequency),
as.Date(BSTS_new_df$Year))
###Rename the columns
names(d2) <- c("Fitted", "Actual", "Date")
### MAPE (mean absolute percentage error)
MAPE <- dplyr::filter(d2, lubridate::year(Date)>=2017) %>%
dplyr::summarise(MAPE=mean(abs(Actual-Fitted)/Actual))
### 95% forecast credible interval
posterior.interval <- cbind.data.frame(
exp(as.numeric(p$interval[1,])),
exp(as.numeric(p$interval[2,])),
tail(d2,12)$Date)
##Rename the columns
names(posterior.interval) <- c("LL", "UL", "Date")
### Join intervals to the forecast
d3 <- left_join(d2, posterior.interval, by="Date")
##Open plotting window
dev.new()
### Plot actual versus predicted with credible intervals for the holdout period
ggplot(data=d3, aes(x=Date)) +
geom_line(aes(y=Actual, colour = "Actual"), size=1.2) +
geom_line(aes(y=Fitted, colour = "Fitted"), size=1.2, linetype=2) +
theme_bw() + theme(legend.title = element_blank()) + ylab("") + xlab("") +
geom_vline(xintercept=as.numeric(as.Date("2017-12-01")), linetype=2) +
geom_ribbon(aes(ymin=LL, ymax=UL), fill="grey", alpha=0.5) +
ggtitle(paste0("BSTS -- Holdout MAPE = ", round(100*MAPE,2), "%")) +
theme(axis.text.x=element_text(angle = -90, hjust = 0))
Data Frame - BSTS_df
structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017), Month = structure(c(5L,
4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L,
9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L,
2L, 12L, 11L, 10L, 3L), .Label = c("April", "August", "December",
"February", "January", "July", "June", "March", "May", "November",
"October", "September"), class = "factor"), Frequency = c(36,
28, 39, 46, 5, 0, 0, 22, 10, 15, 8, 33, 33, 29, 31, 23, 8, 9,
7, 40, 41, 41, 30, 30, 44, 37, 41, 42, 20, 0, 7, 27, 35, 27,
43, 38), Days_Sea_Month = c(31, 28, 31, 30, 6, 0, 0, 29, 15,
29, 29, 31, 31, 29, 30, 30, 7, 0, 7, 30, 30, 31, 30, 27, 31,
28, 30, 30, 21, 0, 7, 26, 29, 27, 29, 29)), row.names = c(NA,
-36L), class = "data.frame")