I am trying to run a multilevel regression for my study:
I have two random effects; participants (97) and items (which are the 20 words used in the study)
Each participant had to spell the same words.
My outcome variable is spelling accuracy and has 2 levels- 1 for correct, and 0 for incorrect.
My predictor variables are all continuous, these are word features and include word length, word frequency, OLD20(neighbourhood density), and OLDF(neighbourhood frequency)
I want to use the raw spelling scores in a regression model without aggregating them first which is why I need to use a multilevel regression model.
I'm trying to figure out the correct code to use but haven't had any luck so far. This is what I've got:
M1 <-lmer(spelling_accuracy ~ 1 + OLD20 + OLDF + wordlength_letter + word_frequency + (1|items) + (1|participants), data = combined_df, REML = FALSE)
This gives me the following warning:
fixed-effect model matrix is rank deficient so dropping 18 columns / coefficients
boundary (singular) fit: see help('isSingular')
Then when I check the model by running:
library(jtools)
summ(M1)
I get the following:
MODEL INFO:
Observations: 1940
Dependent Variable: spelling_accuracy
Type: Mixed effects linear regression
MODEL FIT:
AIC = 1658.29, BIC = 1786.41
Pseudo-R² (fixed effects) = 0.14
Pseudo-R² (total) = 0.53
FIXED EFFECTS:
----------------------------------------------------------
Est. S.E. t val. d.f. p
----------------- ------- ------ -------- --------- ------
(Intercept) 0.52 0.05 11.01 397.48 0.00
OLD201.7 0.19 0.05 3.79 1843.00 0.00
OLD201.85 0.26 0.05 5.26 1843.00 0.00
OLD201.9 0.14 0.05 2.94 1843.00 0.00
OLD201.95 0.22 0.05 4.42 1843.00 0.00
OLD202.0 0.12 0.05 2.52 1843.00 0.01
OLD202.25 -0.40 0.05 -8.20 1843.00 0.00
OLD202.35 -0.01 0.05 -0.21 1843.00 0.83
OLD202.45 0.07 0.05 1.47 1843.00 0.14
OLD202.5 -0.01 0.05 -0.21 1843.00 0.83
OLD202.65 0.16 0.05 3.37 1843.00 0.00
OLD202.7 -0.02 0.05 -0.42 1843.00 0.67
OLD202.9 -0.06 0.05 -1.26 1843.00 0.21
OLD203.0 0.20 0.05 4.00 1843.00 0.00
OLD203.05 -0.44 0.05 -9.05 1843.00 0.00
OLD203.35 -0.02 0.05 -0.42 1843.00 0.67
OLD203.4 -0.16 0.05 -3.37 1843.00 0.00
OLD203.5 0.15 0.05 3.16 1843.00 0.00
OLDF12.7 -0.21 0.05 -4.21 1843.00 0.00
OLDF4.6 0.04 0.05 0.84 1843.00 0.40
----------------------------------------------------------
p values calculated using Satterthwaite d.f.
RANDOM EFFECTS:
----------------------------------------
Group Parameter Std. Dev.
-------------- ------------- -----------
participants (Intercept) 0.31
items (Intercept) 0.00
Residual 0.34
----------------------------------------
Grouping variables:
--------------------------------
Group # groups ICC
-------------- ---------- ------
participants 97 0.45
items 20 0.00
--------------------------------
I'm not sure why it's showing only OLD20 in the model outcome, and why it's presenting it as different levels. I have made sure to code the variables correctly:
as.numeric(combined_df$OLD20)
as.numeric(combined_df$OLDF)
as.factor(combined_df$spelling_accuracy)
Any help with fixing my code for the model would be really appreciated.