I am taking an online edX course on interrupted time series analysis that makes use of R and part of the course shows us how to derive predicted values from the gls model as well as get the absolute and relative change of the predicted vs the counterfactual.
Unfortunately, there is no example of how to get 95% confidence intervals around these predicted changes. On the course discussion board, the instructor linked to this article (Zhang et al, 2009.) where the authors provide SAS code, linked at the end of the 'Methods' section, to get these CIs, but the instructor does not have code that implements this in R.
Someone suggested I could use the R marginaleffects
package to do this. I am trying to apply the materials from this course to a work project, but this one piece of getting CIs around the pred, cfac, and absolute/relative changes (pred - cfac) is stumping me.
Just since I can't share my actual data from work, here's the example data the course gives us (csv), "flow" is the outcome, as in water flow of a river being studied:
year,flow,time,level,trend
1871,3533.916667,1,0,0
1872,3664.25,2,0,0
1873,3047.5,3,0,0
1874,3808.583333,4,0,0
1875,3677.5,5,0,0
1876,3663.416667,6,0,0
1877,2667.583333,7,0,0
1878,3900.083333,8,0,0
1879,4326.666667,9,0,0
1880,3590.583333,10,0,0
1881,3147.25,11,0,0
1882,2954.083333,12,0,0
1883,3498.416667,13,0,0
1884,3142.166667,14,0,0
1885,3213.916667,15,0,0
1886,3035,16,0,0
1887,3711.75,17,0,0
1888,2521.666667,18,0,0
1889,3023.5,19,0,0
1890,3507.5,20,0,0
1891,3493.666667,21,0,0
1892,4268.75,22,0,0
1893,3635.583333,23,0,0
1894,3946.416667,24,0,0
1895,3978.166667,25,0,0
1896,3856,26,0,0
1897,3248.166667,27,0,0
1898,3479.833333,28,1,1
1899,2453,29,1,2
1900,2649.166667,30,1,3
1901,2764.916667,31,1,4
1902,2196.666667,32,1,5
1903,2973.583333,33,1,6
1904,2613.416667,34,1,7
1905,2219.333333,35,1,8
1906,2901,36,1,9
1907,2186.75,37,1,10
1908,3215.333333,38,1,11
1909,3318.25,39,1,12
1910,3063.666667,40,1,13
1911,2605.583333,41,1,14
1912,2239.75,42,1,15
1913,1441.25,43,1,16
1914,2654.166667,44,1,17
1915,2191.75,45,1,18
1916,3570.333333,46,1,19
1917,3498.166667,47,1,20
1918,2564.583333,48,1,21
1919,2434.166667,49,1,22
1920,2594.333333,50,1,23
1921,2415.166667,51,1,24
1922,2683.583333,52,1,25
1923,2733.416667,53,1,26
1924,2729.416667,54,1,27
1925,2138.083333,55,1,28
1926,2665.916667,56,1,29
1927,2306.25,57,1,30
1928,2495.416667,58,1,31
1929,3241.083333,59,1,32
1930,2334.416667,60,1,33
This is what the model looks like, the instructor uses the gls function from the nlme
package:
# Fit the GLS regression model
model_p10 <- gls(flow ~ time + level + trend,
data=data,
correlation=corARMA(p=10,form=~time),
method="ML")
summary(model_p10)
Output:
Coefficients:
Value p-value
(Intercept) 3348.033 0.0000
time 6.745 0.2871
level -738.245 0.0000
trend -14.210 0.0397
That predicted changes code is then ran to calculate the predicted value 25 years AFTER the "weather change" i.e. the interruption, indicated by the level variable in the data which flips from 0 to 1 at time=28, thus there are 27 years before this "weather change"/interruption, 27 years BEFORE the weather change + 25 years AFTER the weather change = 52 years:
# Predicted value at 25 years after the weather change
pred <- fitted(model_p10)[52]
# Then estimate the counterfactual at the same time point
cfac <- model_p10$coef[1] + model_p10$coef[2]*52
# Absolute change at 25 years
pred - cfac
# Relative change at 25 years
(pred - cfac) / cfac
Output:
> # Absolute change at 25 years
> pred - cfac
52
-1093.483
> # Relative change at 25 years
> (pred - cfac) / cfac
52
-0.2956351
Questions:
-
I see that the
marginaleffects
package's predictions( ) function will give me the pred values and 95% CIs around them. How would I then also get CIs for the counterfactual (cfac)? -
And how would I specify the comparisons( ) function to get CIs for the absolute and relative (i.e. %) change between the predicted vs counterfactual (i.e. pred - cfac)? When I tried it out it kept giving me the same difference no matter the time point I input.
I did the following:
library(marginaleffects)
comparisons(model_orig_q1, variables = "level", newdata = datagrid(time = 52))
- Also, this is a bit unrelated, but is there a way to get level and trend coefficients in the model, which currently give the absolute differences between the pre vs post interruption level and trend respectively, as a relative percentage with 95% CIs using the
marginaleffects
package? So instead of saying the trend in water flow changed by -14.2 (-10, -21), I could instead say it decreased by 10% (5%, 15%) or whatever it'd be.