GAM spatio-temporal model: lots of questions about space-time interaction, ziP model, and random effect

angela_italy · January 13, 2024, 7:06pm

I would like to fit a gam to model spatio-temporal count data.
The distribution of my outcome variable is the following

As you can see there are many zeros and I might need to go for a zero inflated model.

But to start off , I have tried with a simple poisson model with a smooth function for my time variable (integer between 1 and 8) and a smooth function for my coordinates.

The model runs without any issue though the residuals look strange (see plot below)


m1=gam(Y ~ s(time,k=7)+ 
         s(long,lat),
       data=df_points, 
       method = "REML", 
       family = "poisson") 

res1=resid(m1)


plot(x=res1,y=fit1, cex=0.7, pch=16, 
     xlab="Fitted values", ylab="Residuals")

When I try to create a model with a tensor product interactions (space and time) I got a warning message highlighting that the fitting terminated with a step failure.

m2=gam(Y ~ s(time,k=7)+ 
         s(long,lat, bs="ds")+ # Duchon splines for spatial data
         ti(long,lat,time,d=c(1,1,1)), # As 1 unit of Longitude isn’t the same as 1 unit Latitude for a big chunk of the planet, use a 3 way tensor product
       data=df_points, 
       method = "REML", 
       family = "poisson")

Warning messages:
1: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Fitting terminated with step failure - check results carefully
2: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  gam.fit3 algorithm did not converge

First question: How can I overcome this issue and create a interaction between space and time?

Going back to m1 I assessed if the data are overdispersed and it turns out that they are. I computed compute the Pearson estimate for the dispersion parameter using the Pearson residuals and it is 3

res1=resid(m1, type="pearson")
Overdispersion=sum(res1^2)/(m1$df.residual)
Overdispersion
[1] 3.075878

Given the above I tried to fit a negative binomial and in this case the Overdispertion calculation gives me 0.85

# Try with the negative binomial 
m1_nb=gam(Y ~ s(time,k=7)+ 
         s(long,lat),
       data=df_points, 
       method = "REML", 
       family = "nb") 

# Check for overdispersion

res1_nb=resid(m1_nb, type="pearson")
Overdispersion_nb=sum(res1_nb^2)/(m1_nb$df.residual)
Overdispersion_nb # it should be 1
[1] 0.8549449

Second question: 0.85 is close to 1.. can I consider my model ok or do I have underdispertion now?

Given that there are many 0 which can also be FALSE ZERO, I tried to fit a zero-inflated model using the ziP family but when I tried to extract the Pearson residuals to check for overdispertion I couldn't ...
Indeed a function is returned instead of the residuals

# Check for overdispersion
res1_ziP=stats::resid(m1_ziP, type="pearson")
res1_ziP

standardGeneric for "res" defined from package "terra"

function (x) 
standardGeneric("res")
<bytecode: 0x56158a3562d8>
<environment: 0x56158a35bf58>
Methods may be defined for arguments: x
Use  showMethods(res)  for currently available ones.

Third question: do you know how I can retrieve the Pearson residuals from a zero-inflated model and test for overdispersion?

Lastly,

Fourth question: The original data are not exact spatial point data, instead they are aggregated to 1,000 km-sized grid cells. In order to implement a spatial gam and take into account of the coordinates, I extracted the centroids of the cells and created the long and lat variables that I used as input for the above models. I am wondering whether it may be better to include the id of the cells within the grid as random effect rather than using the long and lat of the centroids. At the end they are repeated measures over time but I would like to include a sort of proxy for the spatial similarity if the sites are close to each other.. What is the best way to take into account the spatial dimension of the data in a gam? I would have built a neighbouring matrix of the grid cells but this is not supported by gam.

Thanks a lot for your help

2: https://i.stack.imgur.com/CP7cZ.pngI would like to fit a gam to model spatio-temporal count data.
The distribution of my outcome variable is the following