I have a dataset of multiple lakes with water level elevations through time. The observations are not regularly spaced and have many large gaps. Further, some of the older observations may be of lower or unknown quality. I created a separate model that does a reasonably good job of predicting water levels across time, but still misses the actual observations by varying amounts.
I would like to create a third inputed/interpolated set of data in which the solution is:
- informed by the modeled values where observations are missing
- crosses the highly weighted observations
- and is informed by the lower weighted observations
So far, I have used the fable package's TSLM->interpolate to perform this. It works reasonably well, but I cannot see a way to introduce weighting to the process. Further, it relies to heavily on the global coefficient and intercepts making it a bit too volatile when the modeled value significantly misses the observed. I am thinking that I need to use some sort of weighted loess that relies on local coefficients and can accomodate weighting.
library(dplyr)
library(tsibble)
library(fable)
library(ggplot2)
test_data <- data.frame(obs_year = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009),
site_name = c("Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2"),
observed = c(100,200,NA, NA, NA, NA, 220, NA, NA, 125, NA,NA,425, NA, 475, NA, 450, 450, 475, 500),
weights = c(1,1,NA, NA, NA, NA, 2, NA, NA, 2, NA,NA,2, NA, 1, NA, 2, 2, 2, 2),
modeled = c(110,120,165,150, 200, 225, 240, 250, 150, 130, 450,430,415,400, 425, 450, 460, 460, 470, 490))
test_tsibble <- as_tsibble(test_data, key = site_name, index = obs_year)
tslm_interpolate <- test_tsibble %>%
group_by(site_name) %>%
model(lm = TSLM(observed~modeled)) %>%
fabletools::interpolate(test_tsibble)
tslm_interpolate <- left_join(tslm_interpolate, test_data, by = c("site_name", "obs_year")) %>%
dplyr::select(obs_year, site_name, observed = observed.y, imputed = observed.x, modeled, weights)
tslm_interpolate %>%
ggplot(aes(x=obs_year))+
geom_line(aes(y = imputed), color = "blue")+
geom_line(aes(y = modeled), color = "red")+
geom_point(aes(y = observed), color = "green")+
facet_wrap(~site_name, scales = "free_y")