I am trying to run a two-part model for expenditures in my data using R. In my cohort, the observations can either have an expenditure (indicated by a flag as yes/no) and, if yes, the amount of that expenditure.
In Stata, this model itself is easy:
twopm cost i.status i.covA covB covC, firstpart(logit) secondpart(glm, family(gamma) link(log) eform)
margins i.status
The margins command here gives me the marginal means of each level in the "status" variable.
I am trying to replicate these results in R:
zigamma_model <- glmmTMB(cost ~ status + covA + covB + covC,
family = ziGamma(link = "log"),
ziformula = ~ status + covA + covB + covC,
data = cohort)
emmeans(zigamma_model, ~ status, type = "response")
The coefficients of the glmmTMB model itself is almost identical to the coefficients from the twopm in Stata. However, the emmeans command gives me a completely different (and much higher) set of marginal means compared to the margins command.
I tried manually calculating the marginal means using the model from glmmTMB and the result I get is identical to the estimates from the post-estimation in Stata, so I'm not sure why emmeans is giving something different. I think it may not be able to handle zero-inflated models.
Any thoughts or suggestions on alternatives? The emmeans function is handy since I can easily do pairwise comparisons, but not if it's comparing the wrong estimates.