Linear Mixed Model (lmer) and problems with ggeffect and predicted values

matschelarmando · August 7, 2023, 2:37pm

Hi everyone,

I have been making the statistics for my thesis and came quite along, even though I am a bloody beginner in R, of course with a lot of help from my supervisor/professor.

I set up individual lm and lmer investigating the effects of y~x in the past, but at the moment I am setting up a linear mixed model including all my variables to see how they explain changes in my dependant variable including interactions. I want to look at how abiotic and biotic factors (like latitude, elevation, or plant species richness) influence beetle species richness (BSR), with region as a random factor. I had to transform my data for example BSR is log2 transformed as is latitude, to get near normal distribution (at least my supervisor said it was okay).

After setting up the model I used ggeffect() to get predictions for the individual variables, but I am having issues getting a nice range of values. The following is the reprex (I hope I did it right...?):

dataframe <- data.frame(           Region = c("Västerbotten","Västerbotten",
                       "Central Iberian Peninsula","Central Iberian Peninsula",
                       "Seville","Seville","Seville","Navarra","Navarra",
                       "Galicia","Galicia","Davos","Davos","Davos","Davos",
                       "Davos","Bern","Le Vaud","Le Vaud","Le Vaud",
                       "Le Vaud","Le Vaud"),
               BSR = c(0,0,15,11,10,7,9,30,12,
                       7,2,3,3,1,10,15,6,6,5,8,2,9),
         Abundance = c(0,0,63,26,15,8,19,117,
                       33,10,2,3,3,1,40,45,6,7,5,13,2,51),
          Monthnum = c(8,8,6,6,5,5,5,6,6,6,6,
                       7,7,8,8,8,7,7,6,6,6,7),
              long = c(20.435508,20.440477,-3.478333,
                       -3.469722,-6.31592777777777,-6.3174,
                       -6.31908055555555,-1.44398,-1.45902,-9.060556,-9.088944,9.878583,
                       9.85403057110996,9.8513,9.87216,9.82407,7.459026,
                       6.206838,6.246505,6.21909,6.214325,6.205685),
         lat_trans = c(5.99605894988866,
                       5.99608216377688,5.35907598731684,5.3591931744351,5.2121342625697,
                       5.21210420872034,5.21205480259885,5.41714736248073,
                       5.41639862585785,5.41289815540624,5.41189602815859,
                       5.54763872937554,5.54722590843021,5.54759696616991,
                       5.54762719372172,5.54806230775415,5.55536291078723,
                       5.53950666034148,5.53809355385378,5.5365167267499,5.53886127499334,
                       5.5389647323021),
          el_trans = c(6.32455532033676,
                       6.48074069840786,40.8778668719394,41.7013189239861,1,
                       1.4142135623731,1.4142135623731,26.8328157299975,27.3861278752583,
                       2.23606797749979,2.64575131106459,41.1096095821889,50,
                       50,43.1277173056956,39.6358423652128,23.0217288664427,
                       37.1483512420134,26.6082693913001,21.6101827849743,
                       33.436506994601,35.9444015112229),
         PSR_trans = c(3.3166247903554,
                       2.64575131106459,6.92820323027551,6.557438524302,3.46410161513775,
                       3.3166247903554,3.74165738677394,6.48074069840786,
                       7.61577310586391,5.74456264653803,4.79583152331272,
                       6.2449979983984,6.40312423743285,6.85565460040104,
                       7.07106781186548,4.47213595499958,6.70820393249937,
                       8.83176086632785,7.28010988928052,6.78232998312527,9.2736184954957,
                       10),
         mean_temp = c(4.5,4.5,9.7,9.7,18.775,
                       18.775,18.775,11.45,11.283,14.57,14.625,-0.1,-0.1,
                       -0.1,-0.1,-0.1,7.8,7.866667,7.866667,9.38,7.866667,
                       7.866667),
       mean_precip = c(59.583,59.583,50.75,50.75,
                       44.17,44.17,44.17,66.33,66.33,128.5,128.5,120.9167,
                       120.9167,120.9167,120.9167,120.9167,97.83,129.17,
                       129.17,129.17,129.17,129.17),
        biom_trans = c(4.81227596881143,
                       4.5263671967705,5.55940644313761,4.46116576692685,5.06882530770197,
                       6.24911033667993,3.9572409018855,2.79508497187474,
                       7.15558523113239,NA,NA,6.00474812127869,3.76473770666696,
                       3.93921311939326,6.61585217489025,5.65141575182715,
                       9.19778052938135,6.56033535728167,7.12165711053263,
                       7.45318723768563,6.64928567592038,5.61773975901341))

The following is my whole code I am using to plot the effect of latitude(transformed) on the beetles species richness:

dataframe$Monthnum_f <- factor(dataframe$Monthnum)

mod <- lmer(log2(BSR+1) ~ Monthnum_f + lat_trans + long + biom_trans + PSR_trans + mean_precip + mean_temp +  I(el_trans^2) + el_trans + (1|Region), data=dataframe)

# Get the marginal effects using ggeffect()
effects <- ggeffect(mod, terms = "lat_trans")

ggplot() + 
  geom_point(data=dataframe, aes(lat_trans, log2(BSR+1))) + 
  geom_line(data=effects, aes(x = x, y = predicted), color = "red") +
  geom_ribbon(data=effects, aes(x = x, ymin = conf.low, ymax = conf.high), alpha = .2, color = NA) +
  guides(color="none")+
  theme_bw()+
  theme_classic()+
  scale_y_continuous(breaks=c(0,1,2,3,4,5),
                     labels=c("0", "1", "3", "7", "15", "31"))+
  scale_x_continuous(breaks=log2(c(0,32,40,50,60)),
                     labels=c(0,32,40,50,60))+
  #guides("color"="none")+
  labs(title = "LMM: BSR ~ Latitudetrans") +
  xlab("log2(latitude)") +
  ylab("log2(BSR+1)") +
  theme(plot.title = element_text(hjust = 0.5))

!Be aware that this is only one part of my dataset! Which is why there are so few data points at the regression line shows a positive relationship, whereas with the whole dataset it shows a negative one!

It does plot me the effect and the effect is exactly as it is expected in theory or what I want, which is nice. But as you can see the plot or the x-axis is quite moved to the right (There is a gap at the beginning), because the predicted values generated by ggeffect() start where I have no data for "lat_trans", which is at 5.0. In general when I use the code below + as you can see in the plot by the blue dots:

# Extract the data frame with predicted values
                      predicted_values <- data.frame(lat_trans = effects$x, predicted = effects$predicted)
                      
                      # Print the first few rows of the predicted_values data frame
                      head(predicted_values)

it show only 3 predicted points for lat_trans but I have way more, or at least I would expect it to predict more points...

Any idea how to fix this issue, so that I get more predicted values and so that there is no gap at the beginning of the x-axis? I also tried to used the "grid.levels" argument but it was useless... This is driving me crazy!

Thank you for any help!!

Cheers,
Marcel

nirgrahamuk · August 7, 2023, 3:01pm

I think you want to replace the scale x code with controlling the limits of the cartesian space

  coord_cartesian(xlim=range(dataframe$lat_trans))+

matschelarmando · August 7, 2023, 3:20pm

I feeling dumb for not finding this solution... Thank you, at least the plot looks nice now!!
However, I am a little concerned by the 3 predicted values. Do they have influences on my confidence intervalls? I think so as they are calculated by using the stadard error of the residuals, which are the distance between the fit and the actual data points. Or does it not play a role and I understood it wrong the way they are calculated?

Thank you for additionall help!!

nirgrahamuk · August 7, 2023, 3:48pm

ggeffects have defaults to make its output 'prettier'
you can directly use effects package rather than ggeffects to get all the detail


(myeffects <- Effect("lat_trans",mod)  |> as.data.frame())

ggplot() + 
  geom_point(data=dataframe, aes(lat_trans, log2(BSR+1))) + 
  geom_line(data=myeffects, aes(x=lat_trans, y = fit), color = "red") +
  geom_ribbon(data=myeffects, aes(x=lat_trans, ymin = lower, ymax = upper), alpha = .2, color = NA) +
  guides(color="none")+
  theme_bw()+
  theme_classic()+ 
  coord_cartesian(xlim=range(dataframe$lat_trans))+
  #guides("color"="none")+
  labs(title = "LMM: BSR ~ Latitudetrans") +
  xlab("log2(latitude)") +
  ylab("log2(BSR+1)") +
  theme(plot.title = element_text(hjust = 0.5))

matschelarmando · August 8, 2023, 9:37am

Thank you! To be honest I already tryed it and taking a look at it, the confidence intervals are quite different than to ggeffect().. Is that a problem? I am quite unsure why that is and if that is statistically a problem?

matschelarmando · August 8, 2023, 10:55am

And when I run your code you juste provided I encounter this Error:

Error in `fortify()`:
! `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class eff.
Run `rlang::last_error()` to see where the error occurred.

What Is wrong with it? I can't find anything...

nirgrahamuk · August 8, 2023, 11:05am

I just tested my code in a new R session, and it works.
are you sure you are running my code exactly ?
as.data.frame() should have made it sufficiently dataframe to be ingested by ggplot functions...

matschelarmando · August 8, 2023, 12:34pm

Yes… I used your exact code, but changed the data frame in the ggplot to use my complete dataframe. My R also doesn‘t recognize the „|>“ part…

nirgrahamuk · August 8, 2023, 2:22pm

This is the R's native pipe; if you upgrade your R version, you would be able to use it. until then you can use %>%

nirgrahamuk · August 8, 2023, 2:23pm

I dont see how simply more data in the same set up would break the code.
you could try to make a reprex if you need more support

matschelarmando · August 9, 2023, 10:37am

Thanks for the help! I don't see either.. I also resarted my session but it doesn't work.
I will come to you with a reprex tomorrow! thx

matschelarmando · August 10, 2023, 8:54am

Hey so I tried it again with the subset of the data and with your code but changed the ,,|>" or "%>%" to a simple object, as both don't work... I know I have to update it, but my supervisor told me that if I update, there might be some packages I used that could be stop working as they did until now. So I'll wait until I finished this analysis.

This is the exact code I used:

data.frame(
  stringsAsFactors = FALSE,
            Region = c("Västerbotten","Västerbotten",
                       "Central Iberian Peninsula","Central Iberian Peninsula",
                       "Seville","Seville","Seville","Navarra","Navarra",
                       "Galicia","Galicia","Davos","Davos","Davos","Davos",
                       "Davos","Bern","Le Vaud","Le Vaud","Le Vaud",
                       "Le Vaud","Le Vaud"),
               BSR = c(0,0,15,11,10,7,9,30,12,
                       7,2,3,3,1,10,15,6,6,5,8,2,9),
         Abundance = c(0,0,63,26,15,8,19,117,
                       33,10,2,3,3,1,40,45,6,7,5,13,2,51),
          Monthnum = c(8,8,6,6,5,5,5,6,6,6,6,
                       7,7,8,8,8,7,7,6,6,6,7),
              long = c(20.435508,20.440477,-3.478333,
                       -3.469722,-6.31592777777777,-6.3174,
                       -6.31908055555555,-1.44398,-1.45902,-9.060556,-9.088944,9.878583,
                       9.85403057110996,9.8513,9.87216,9.82407,7.459026,
                       6.206838,6.246505,6.21909,6.214325,6.205685),
         lat_trans = c(5.99605894988866,
                       5.99608216377688,5.35907598731684,5.3591931744351,5.2121342625697,
                       5.21210420872034,5.21205480259885,5.41714736248073,
                       5.41639862585785,5.41289815540624,5.41189602815859,
                       5.54763872937554,5.54722590843021,5.54759696616991,
                       5.54762719372172,5.54806230775415,5.55536291078723,
                       5.53950666034148,5.53809355385378,5.5365167267499,5.53886127499334,
                       5.5389647323021),
          el_trans = c(6.32455532033676,
                       6.48074069840786,40.8778668719394,41.7013189239861,1,
                       1.4142135623731,1.4142135623731,26.8328157299975,27.3861278752583,
                       2.23606797749979,2.64575131106459,41.1096095821889,50,
                       50,43.1277173056956,39.6358423652128,23.0217288664427,
                       37.1483512420134,26.6082693913001,21.6101827849743,
                       33.436506994601,35.9444015112229),
         PSR_trans = c(3.3166247903554,
                       2.64575131106459,6.92820323027551,6.557438524302,3.46410161513775,
                       3.3166247903554,3.74165738677394,6.48074069840786,
                       7.61577310586391,5.74456264653803,4.79583152331272,
                       6.2449979983984,6.40312423743285,6.85565460040104,
                       7.07106781186548,4.47213595499958,6.70820393249937,
                       8.83176086632785,7.28010988928052,6.78232998312527,9.2736184954957,
                       10),
         mean_temp = c(4.5,4.5,9.7,9.7,18.775,
                       18.775,18.775,11.45,11.283,14.57,14.625,-0.1,-0.1,
                       -0.1,-0.1,-0.1,7.8,7.866667,7.866667,9.38,7.866667,
                       7.866667),
       mean_precip = c(59.583,59.583,50.75,50.75,
                       44.17,44.17,44.17,66.33,66.33,128.5,128.5,120.9167,
                       120.9167,120.9167,120.9167,120.9167,97.83,129.17,
                       129.17,129.17,129.17,129.17),
        biom_trans = c(4.81227596881143,
                       4.5263671967705,5.55940644313761,4.46116576692685,5.06882530770197,
                       6.24911033667993,3.9572409018855,2.79508497187474,
                       7.15558523113239,NA,NA,6.00474812127869,3.76473770666696,
                       3.93921311939326,6.61585217489025,5.65141575182715,
                       9.19778052938135,6.56033535728167,7.12165711053263,
                       7.45318723768563,6.64928567592038,5.61773975901341)
)
                      
                head(dataframe)
                dataframe$Monthnum_f <- factor(dataframe$Monthnum)
                
                
                      mod <- lmer(log2(BSR+1) ~ Monthnum_f + lat_trans + long + biom_trans + PSR_trans + mean_precip + mean_temp +  I(el_trans^2) + el_trans + (1|Region), data=dataframe)
                      

                      (myeffects <- Effect("lat_trans",mod)  %>% as.data.frame())
                      
                      myeffects <- effect("lat_trans",mod)
                      ggplot() + 
                        geom_point(data=dataframe, aes(lat_trans, log2(BSR+1))) + 
                        geom_line(data=myeffects, aes(x=lat_trans, y = fit), color = "red") +
                        geom_ribbon(data=myeffects, aes(x=lat_trans, ymin = lower, ymax = upper), alpha = .2, color = NA) +
                        guides(color="none")+
                        theme_bw()+
                        theme_classic()+ 
                        coord_cartesian(xlim=range(dataframe$lat_trans))+
                        #guides("color"="none")+
                        labs(title = "LMM: BSR ~ Latitudetrans") +
                        xlab("log2(latitude)") +
                        ylab("log2(BSR+1)") +
                        theme(plot.title = element_text(hjust = 0.5))

And this is the Output I get:

Error in `fortify()`:
! `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class eff.
Run `rlang::last_error()` to see where the error occurred.

I don't get it. It's the same error I get when I use the whole dataset.

nirgrahamuk · August 10, 2023, 9:20am

yeah so you follow up a line of code that was enough, with some code that undoes it and puts it in undesirable state. this part :

(myeffects <- Effect("lat_trans",mod)  %>% as.data.frame())
myeffects <- effect("lat_trans",mod)

the second line is not giving a data frame; the first line explicitly does so. ggplot2 works with dataframes by default.

matschelarmando · August 10, 2023, 9:58am

Oh okay, I see! Many thanks.

However, is there any difference in the result when using effect or ggefects? Is one of them wrong? Or can I continue using ggeffects? I just ploted all my models using ggeffects now and I would be finished, so I can show them to my supervisor and discuss if I can start writing now

nirgrahamuk · August 10, 2023, 10:40am

is one of them wrong ? I dont think so in terms of how I understand wrong.
consider this simple analogy.
ggplot2 library includes diamonds dataset;

library(tidyverse)

diamonds |> group_by(cut(x,breaks=5)) |> 
  summarise(mean_carat = mean(carat))

diamonds |> group_by(cut(x,breaks=6)) |> 
  summarise(mean_carat = mean(carat))

these different analysis have similarities; they bracket the diamonds into different categories based on length (x) and report the mean carat rating for those.

matschelarmando · August 10, 2023, 11:14am

Okay so as long as I stick to one ,,method" consistently, I don't need to care for the other?

system · August 17, 2023, 11:14am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.