Binary logistic regression - singular fit

Hi, :slight_smile:
I'm needing some help interpreting the output of a binary logistic regression GLMM that accounts for individual subjects being used in 3 replicate trials each. My experiment is looking at whether sponge.side is a significant predictor of decision.side (whether a subject is attracted to the side a marine sponge is on in a Y maze experiment). I'm worried because the estimate is unusually large (see image below) and gives the output "Hessian is numerically singular: parameters are not uniquely determined". I'm wondering if this is occurring because my experiment is only made up of 15 trials total, with 5 subjects being used in 3 replicates each. I have attached the r code and image of the output below.

Help would be greatly appreciated! Thanks in advance :slight_smile:

Load packages -- do this every time

library(lme4) # For lmer function
#Loading required package: Matrix
library(car) # For F-tests, likelihood ratio and Wald chi-squared tests
library(dplyr)

Read data into a data frame

y_maze_sponge_12hr_data <- read.csv("sponge_12hr.csv")

head(y_maze_sponge_12hr_data)

Sample sizes

table(y_maze_sponge_12hr_data$sponge.side)

#remove NAs
clean_y_maze_sponge_12hr_data <- y_maze_sponge_12hr_data %>%
filter(!is.na(decision.side))

table(clean_y_maze_sponge_12hr_data$sponge.side)

#frequency table
net = table(clean_y_maze_sponge_12hr_data$decision.side,clean_y_maze_sponge_12hr_data$sponge.side); net ###########

sponge.side = as.factor(clean_y_maze_sponge_12hr_data$sponge.side)

Fit a mixed model##############

glm = glmer(clean_y_maze_sponge_12hr_data$decision.side ~ clean_y_maze_sponge_12hr_data$sponge.side + (1 | clean_y_maze_sponge_12hr_data$ï..subject), family = binomial)
summary(glm)
#p val = 0.00145 ** (Accounting for repeated measures)

Please post the output of

dput(clean_y_maze_sponge_12hr_data)

Put a line with three back ticks just before and after the pasted output, like this:
```
output of dput()
```
Separately, this line of code

sponge.side = as.factor(clean_y_maze_sponge_12hr_data$sponge.side)

is not changing the sponge.side column in the data frame. It is making a new variable named sponge.side.

image

Fit a mixed model##############

glm = glmer(clean_y_maze_sponge_12hr_data$decision.side ~ clean_y_maze_sponge_12hr_data$sponge.side + (1 | clean_y_maze_sponge_12hr_data$ï..subject), family = binomial)
summary(glm)
#p val = 0.00145 ** (Accounting for repeated measures)

dput(clean_y_maze_sponge_12hr_data)

Hi, here is the output. :slight_smile:

I am no expert in this kind of regression but I'll make a few observations. Here is your data frame and a summary of it using table(). I shortened the data frame name.

clean_data <- data.frame(
  subject = c(1,1,2,2,2,3,3,3,4,4,4,5,5),
  sponge.side = c("left","right", "left","right", "left", "right","right",
                  "left","left","right","left","left","right"),
  decision.side = c(1,0,0,0,0,0,0,0,1,0,1,1,0))
clean_data
#>    subject sponge.side decision.side
#> 1        1        left             1
#> 2        1       right             0
#> 3        2        left             0
#> 4        2       right             0
#> 5        2        left             0
#> 6        3       right             0
#> 7        3       right             0
#> 8        3        left             0
#> 9        4        left             1
#> 10       4       right             0
#> 11       4        left             1
#> 12       5        left             1
#> 13       5       right             0

table(clean_data$sponge.side, clean_data$decision.side)
#>        
#>         0 1
#>   left  3 4
#>   right 6 0

Created on 2023-10-18 with reprex v2.0.2
The summary shows that when sponge.side is right the decision side is always 0 while a sponge.side of left yields 4/7 decision.side of 1. This implies a strong effect of sponge.side = right but there is no way to way to determine a limit on how strong the effect is. As far as you can tell from the data, it is infinitely strong. The fit returns a coefficient of -5326, but, as you observed, that magnitude is absurdly large.
Introducing a subject-dependent intercept doesn't make a lot of sense when you have 2 or 3 observations per subject. The intercept is going to be highly uncertain. With your particular data, two of the subjects have decision.side = 0 in all cases. There is no way to set the lower bound of the intercept.
You probably want to know how to proceed. I can't give you good advice about that because I don't really understand your data or your goal. Having more data would help a lot. That would help you get better estimates of the intercepts and it would probably get you some cases where sponge.side = right yields a decision.side = 1 .
Remember I'm just some random guy on the internet.

As an aside, don't post a picture of the output of dput(), post the actual text from the console, like this:

dput(clean_data)
structure(list(subject = c(1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 
5), sponge.side = c("left", "right", "left", "right", "left", 
"right", "right", "left", "left", "right", "left", "left", "right"
), decision.side = c(1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0)), class = "data.frame", row.names = c(NA, 
-13L))

Others can copy that structure() function and easily replicate the data.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.