Hi everyone,
I have been making the statistics for my thesis and came quite along, even though I am a bloody beginner in R, of course with a lot of help from my supervisor/professor.
I set up individual lm and lmer investigating the effects of y~x in the past, but at the moment I am setting up a linear mixed model including all my variables to see how they explain changes in my dependant variable including interactions. I want to look at how abiotic and biotic factors (like latitude, elevation, or plant species richness) influence beetle species richness (BSR), with region as a random factor. I had to transform my data for example BSR is log2 transformed as is latitude, to get near normal distribution (at least my supervisor said it was okay).
After setting up the model I used ggeffect() to get predictions for the individual variables, but I am having issues getting a nice range of values. The following is the reprex (I hope I did it right...?):
dataframe <- data.frame( Region = c("Västerbotten","Västerbotten",
"Central Iberian Peninsula","Central Iberian Peninsula",
"Seville","Seville","Seville","Navarra","Navarra",
"Galicia","Galicia","Davos","Davos","Davos","Davos",
"Davos","Bern","Le Vaud","Le Vaud","Le Vaud",
"Le Vaud","Le Vaud"),
BSR = c(0,0,15,11,10,7,9,30,12,
7,2,3,3,1,10,15,6,6,5,8,2,9),
Abundance = c(0,0,63,26,15,8,19,117,
33,10,2,3,3,1,40,45,6,7,5,13,2,51),
Monthnum = c(8,8,6,6,5,5,5,6,6,6,6,
7,7,8,8,8,7,7,6,6,6,7),
long = c(20.435508,20.440477,-3.478333,
-3.469722,-6.31592777777777,-6.3174,
-6.31908055555555,-1.44398,-1.45902,-9.060556,-9.088944,9.878583,
9.85403057110996,9.8513,9.87216,9.82407,7.459026,
6.206838,6.246505,6.21909,6.214325,6.205685),
lat_trans = c(5.99605894988866,
5.99608216377688,5.35907598731684,5.3591931744351,5.2121342625697,
5.21210420872034,5.21205480259885,5.41714736248073,
5.41639862585785,5.41289815540624,5.41189602815859,
5.54763872937554,5.54722590843021,5.54759696616991,
5.54762719372172,5.54806230775415,5.55536291078723,
5.53950666034148,5.53809355385378,5.5365167267499,5.53886127499334,
5.5389647323021),
el_trans = c(6.32455532033676,
6.48074069840786,40.8778668719394,41.7013189239861,1,
1.4142135623731,1.4142135623731,26.8328157299975,27.3861278752583,
2.23606797749979,2.64575131106459,41.1096095821889,50,
50,43.1277173056956,39.6358423652128,23.0217288664427,
37.1483512420134,26.6082693913001,21.6101827849743,
33.436506994601,35.9444015112229),
PSR_trans = c(3.3166247903554,
2.64575131106459,6.92820323027551,6.557438524302,3.46410161513775,
3.3166247903554,3.74165738677394,6.48074069840786,
7.61577310586391,5.74456264653803,4.79583152331272,
6.2449979983984,6.40312423743285,6.85565460040104,
7.07106781186548,4.47213595499958,6.70820393249937,
8.83176086632785,7.28010988928052,6.78232998312527,9.2736184954957,
10),
mean_temp = c(4.5,4.5,9.7,9.7,18.775,
18.775,18.775,11.45,11.283,14.57,14.625,-0.1,-0.1,
-0.1,-0.1,-0.1,7.8,7.866667,7.866667,9.38,7.866667,
7.866667),
mean_precip = c(59.583,59.583,50.75,50.75,
44.17,44.17,44.17,66.33,66.33,128.5,128.5,120.9167,
120.9167,120.9167,120.9167,120.9167,97.83,129.17,
129.17,129.17,129.17,129.17),
biom_trans = c(4.81227596881143,
4.5263671967705,5.55940644313761,4.46116576692685,5.06882530770197,
6.24911033667993,3.9572409018855,2.79508497187474,
7.15558523113239,NA,NA,6.00474812127869,3.76473770666696,
3.93921311939326,6.61585217489025,5.65141575182715,
9.19778052938135,6.56033535728167,7.12165711053263,
7.45318723768563,6.64928567592038,5.61773975901341))
The following is my whole code I am using to plot the effect of latitude(transformed) on the beetles species richness:
dataframe$Monthnum_f <- factor(dataframe$Monthnum)
mod <- lmer(log2(BSR+1) ~ Monthnum_f + lat_trans + long + biom_trans + PSR_trans + mean_precip + mean_temp + I(el_trans^2) + el_trans + (1|Region), data=dataframe)
# Get the marginal effects using ggeffect()
effects <- ggeffect(mod, terms = "lat_trans")
ggplot() +
geom_point(data=dataframe, aes(lat_trans, log2(BSR+1))) +
geom_line(data=effects, aes(x = x, y = predicted), color = "red") +
geom_ribbon(data=effects, aes(x = x, ymin = conf.low, ymax = conf.high), alpha = .2, color = NA) +
guides(color="none")+
theme_bw()+
theme_classic()+
scale_y_continuous(breaks=c(0,1,2,3,4,5),
labels=c("0", "1", "3", "7", "15", "31"))+
scale_x_continuous(breaks=log2(c(0,32,40,50,60)),
labels=c(0,32,40,50,60))+
#guides("color"="none")+
labs(title = "LMM: BSR ~ Latitudetrans") +
xlab("log2(latitude)") +
ylab("log2(BSR+1)") +
theme(plot.title = element_text(hjust = 0.5))
!Be aware that this is only one part of my dataset! Which is why there are so few data points at the regression line shows a positive relationship, whereas with the whole dataset it shows a negative one!
It does plot me the effect and the effect is exactly as it is expected in theory or what I want, which is nice. But as you can see the plot or the x-axis is quite moved to the right (There is a gap at the beginning), because the predicted values generated by ggeffect() start where I have no data for "lat_trans", which is at 5.0. In general when I use the code below + as you can see in the plot by the blue dots:
# Extract the data frame with predicted values
predicted_values <- data.frame(lat_trans = effects$x, predicted = effects$predicted)
# Print the first few rows of the predicted_values data frame
head(predicted_values)
it show only 3 predicted points for lat_trans but I have way more, or at least I would expect it to predict more points...
Any idea how to fix this issue, so that I get more predicted values and so that there is no gap at the beginning of the x-axis? I also tried to used the "grid.levels" argument but it was useless... This is driving me crazy!
Thank you for any help!!
Cheers,
Marcel