Binary logistic regression - singular fit

kaito · October 18, 2023, 2:34pm

Hi,
I'm needing some help interpreting the output of a binary logistic regression GLMM that accounts for individual subjects being used in 3 replicate trials each. My experiment is looking at whether sponge.side is a significant predictor of decision.side (whether a subject is attracted to the side a marine sponge is on in a Y maze experiment). I'm worried because the estimate is unusually large (see image below) and gives the output "Hessian is numerically singular: parameters are not uniquely determined". I'm wondering if this is occurring because my experiment is only made up of 15 trials total, with 5 subjects being used in 3 replicates each. I have attached the r code and image of the output below.

Help would be greatly appreciated! Thanks in advance

Load packages -- do this every time

library(lme4) # For lmer function
#Loading required package: Matrix
library(car) # For F-tests, likelihood ratio and Wald chi-squared tests
library(dplyr)

Read data into a data frame

y_maze_sponge_12hr_data <- read.csv("sponge_12hr.csv")

head(y_maze_sponge_12hr_data)

Sample sizes

table(y_maze_sponge_12hr_data$sponge.side)

#remove NAs
clean_y_maze_sponge_12hr_data <- y_maze_sponge_12hr_data %>%
filter(!is.na(decision.side))

table(clean_y_maze_sponge_12hr_data$sponge.side)

#frequency table
net = table(clean_y_maze_sponge_12hr_data$decision.side,clean_y_maze_sponge_12hr_data$sponge.side); net ###########

sponge.side = as.factor(clean_y_maze_sponge_12hr_data$sponge.side)

Fit a mixed model##############

glm = glmer(clean_y_maze_sponge_12hr_data$decision.side ~ clean_y_maze_sponge_12hr_data$sponge.side + (1 | clean_y_maze_sponge_12hr_data$ï..subject), family = binomial)
summary(glm)
#p val = 0.00145 ** (Accounting for repeated measures)

FJCC · October 18, 2023, 2:52pm

Please post the output of

dput(clean_y_maze_sponge_12hr_data)

Put a line with three back ticks just before and after the pasted output, like this:
```
output of dput()
```
Separately, this line of code

sponge.side = as.factor(clean_y_maze_sponge_12hr_data$sponge.side)

is not changing the sponge.side column in the data frame. It is making a new variable named sponge.side.

kaito · October 19, 2023, 3:50am

Fit a mixed model##############

glm = glmer(clean_y_maze_sponge_12hr_data$decision.side ~ clean_y_maze_sponge_12hr_data$sponge.side + (1 | clean_y_maze_sponge_12hr_data$ï..subject), family = binomial)
summary(glm)
#p val = 0.00145 ** (Accounting for repeated measures)

dput(clean_y_maze_sponge_12hr_data)

Hi, here is the output.

FJCC · October 19, 2023, 5:03am

I am no expert in this kind of regression but I'll make a few observations. Here is your data frame and a summary of it using table(). I shortened the data frame name.

clean_data <- data.frame(
  subject = c(1,1,2,2,2,3,3,3,4,4,4,5,5),
  sponge.side = c("left","right", "left","right", "left", "right","right",
                  "left","left","right","left","left","right"),
  decision.side = c(1,0,0,0,0,0,0,0,1,0,1,1,0))
clean_data
#>    subject sponge.side decision.side
#> 1        1        left             1
#> 2        1       right             0
#> 3        2        left             0
#> 4        2       right             0
#> 5        2        left             0
#> 6        3       right             0
#> 7        3       right             0
#> 8        3        left             0
#> 9        4        left             1
#> 10       4       right             0
#> 11       4        left             1
#> 12       5        left             1
#> 13       5       right             0

table(clean_data$sponge.side, clean_data$decision.side)
#>        
#>         0 1
#>   left  3 4
#>   right 6 0

^{Created on 2023-10-18 with reprex v2.0.2}
The summary shows that when sponge.side is right the decision side is always 0 while a sponge.side of left yields 4/7 decision.side of 1. This implies a strong effect of sponge.side = right but there is no way to way to determine a limit on how strong the effect is. As far as you can tell from the data, it is infinitely strong. The fit returns a coefficient of -5326, but, as you observed, that magnitude is absurdly large.
Introducing a subject-dependent intercept doesn't make a lot of sense when you have 2 or 3 observations per subject. The intercept is going to be highly uncertain. With your particular data, two of the subjects have decision.side = 0 in all cases. There is no way to set the lower bound of the intercept.
You probably want to know how to proceed. I can't give you good advice about that because I don't really understand your data or your goal. Having more data would help a lot. That would help you get better estimates of the intercepts and it would probably get you some cases where sponge.side = right yields a decision.side = 1 .
Remember I'm just some random guy on the internet.

FJCC · October 19, 2023, 5:06am

As an aside, don't post a picture of the output of dput(), post the actual text from the console, like this:

dput(clean_data)
structure(list(subject = c(1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 
5), sponge.side = c("left", "right", "left", "right", "left", 
"right", "right", "left", "left", "right", "left", "left", "right"
), decision.side = c(1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0)), class = "data.frame", row.names = c(NA, 
-13L))

Others can copy that structure() function and easily replicate the data.

system · November 9, 2023, 5:06am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.