Working with transformed variables

Laura.Armey · February 9, 2024, 8:41pm

I understand that if I use fable to predict using a log or box cox transformation my predictions come back in terms of the original variable as do my prediction intervals.
I want to do something similar but with per capita measures, I have estimates of future population that are pretty good (or I can create future population scenarios like I do if I am forecasting post time series linear regression).
Also, the command accuracy returns an error that it can't find the two variables, I've been working around this by also using mutate to make the per cap variable and using that.

jrkrideau · February 10, 2024, 5:34pm

Welcome to the forum.

We, probably, need to see your code and some sample data. See
FAQ Asking Questions

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here between
```

```

Laura.Armey · February 11, 2024, 5:43pm

structure(list(year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2021, 
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021
), yearmonth = structure(c(16436, 16467, 16495, 16526, 16556, 
16587, 16617, 16648, 16679, 16709, 16740, 16770, 16801, 16832, 
16861, 16892, 16922, 16953, 16983, 17014, 17045, 17075, 17106, 
17136, 17167, 17198, 17226, 17257, 17287, 17318, 17348, 17379, 
17410, 17440, 17471, 17501, 17532, 17563, 17591, 17622, 17652, 
17683, 17713, 17744, 17775, 17805, 17836, 17866, 17897, 17928, 
17956, 17987, 18017, 18048, 18078, 18109, 18140, 18170, 18201, 
18231, 18262, 18293, 18322, 18353, 18383, 18414, 18444, 18475, 
18506, 18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 
18779, 18809, 18840, 18871, 18901, 18932, 18962), class = c("yearmonth", 
"vctrs_vctr")), employees = c(2827, 2827, 2827, 2827, 2827, 2827, 
2827, 2827, 2827, 2800, 2800, 2800, 2778, 2778, 2778, 2778, 2778, 
2778, 2778, 2778, 2778, 2784, 2784, 2784, 2882, 2882, 2882, 2882, 
2882, 2882, 2882, 2882, 2882, 2880, 2880, 2880, 2710, 2710, 2710, 
2710, 2710, 2710, 2710, 2710, 2710, 2673, 2673, 2673, 2727, 2727, 
2727, 2727, 2727, 2727, 2727, 2727, 2727, 2726, 2726, 2726, 2788, 
2788, 2788, 2788, 2788, 2788, 2788, 2788, 2788, 2612, 2612, 2612, 
2526, 2526, 2526, 2526, 2526, 2526, 2526, 2526, 2526, 2363, 2363, 
2363), leave = c(409, 446, 514, 935, 1437, 7918, 3617, 2150, 
520, 460, 622, 464, 740, 399, 685, 685, 735, 5516, 3537, 3109, 
423, 211, 202, 263, 717, 657, 563, 907, 1388, 6884, 4165, 2302, 
673, 287, 243, 468, 705, 967, 788, 968, 1209, 7187, 3256, 2556, 
593, 581, 769, 553, 578, 662, 1182, 622, 893, 6060, 3002, 2223, 
608, 374, 430, 399, 630, 574, 721, 413, 346, 428, 3901, 5768, 
1543, 458, 453, 388, 385, 430, 557, 675, 1589, 1776, 4534, 4688, 
736, 359, 355, 448)), class = c("grouped_ts", "grouped_df", "tbl_ts", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -84L), key = structure(list(
    .rows = structure(list(1:84), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L)), index = structure("yearmonth", ordered = TRUE), index2 = "yearmonth", interval = structure(list(
    year = 0, quarter = 0, month = 1, week = 0, day = 0, hour = 0, 
    minute = 0, second = 0, millisecond = 0, microsecond = 0, 
    nanosecond = 0, unit = 0), .regular = TRUE, class = c("interval", 
"vctrs_rcrd", "vctrs_vctr")), groups = structure(list(year = c(2015, 
2016, 2017, 2018, 2019, 2020, 2021), .rows = structure(list(1:12, 
    13:24, 25:36, 37:48, 49:60, 61:72, 73:84), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -7L), .drop = TRUE, class = c("tbl_df", 
"tbl", "data.frame")))

so here is my code

newdata<-
  newdata|>
  as_tsibble(key=NULL, index=yearmonth)

##Split datasets
newdata_train<-newdata|>
  filter(year<2020)

##build a forecasting model
model1<-newdata_train|>
  model(modelets= ETS((leave/employees)  ~ error("A") + trend("A") + season("A")))

fc <- model1|> forecast(h = 12)

accuracy(fc, newdata)

so I tried

future<-newdata|>
  filter(year>2019)|>
  select(yearmonth, employees)


fc2<-model1|> forecast(model1, new_data=future)

accuracy(fc2, newdata)

robjhyndman · February 11, 2024, 9:39pm

The issue here is that fable doesn't know that you want forecasts of leave, and it can't find a variable called leave / employees in the test data you pass accuracy(). Here is some code that will do what I think you intend.

## build a forecasting model
model1 <- newdata_train |>
  mutate(leave_per_capita = leave/employees) |>
  model(modelets = ETS(leave_per_capita ~ error("A") + trend("A") + season("A")))

fc <- model1 |>
  forecast(newdata |> filter(year >= 2020)) |>
  mutate(leave = leave_per_capita * employees) |>
  as_fable(response = "leave", distribution = leave)

accuracy(fc, newdata)
#> # A tibble: 1 × 10
#>   .model   .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
#>   <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
#> 1 modelets Test  -68.5 1808.  922. -67.9  104.  2.50  3.24 0.0371

^{Created on 2024-02-12 with reprex v2.1.0}

Laura.Armey · February 18, 2024, 8:04am

Thank-you so much for your help!

system · February 25, 2024, 8:05am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.