Scale function on categorical variable producing NA values


I'm trying to mean-centre a categorical variable (interference) with 3 levels by recoding it as a numeric variable with the values -1, 0 and 1. However, using the scale function to do this only has the effect of turning all the values in this variable to NA and I can't figure out why?

When I try to run the GLMM function, I later get an 'Error: Invalid grouping factor specification, subject_nr' message, which I think may be because of the earlier issue (subject_nr does exist in the d0 dataframe).

Any help would be much appreciated!

d0$subject_nr = as.factor(d0$subject_nr)
d0$speech = scale(ifelse(d0$speech == 'distorted', 1,0), scale = FALSE)

# recode d$interference as a numeric variable with values of -1, 0, and 1 
d0$interference = as.numeric(d0$interference)-2
d0$int = scale(d0$interference, scale = FALSE)

# d0$int = scale(d0$interference == case_when(d0$interference == "none" ~ -1, 
#                                  d0$interference == "foot-tap" ~ 0,
#                                  d0$interference == "whisper" ~ 1), scale = FALSE)

# GLMM interaction model (familiarity and interference)
glmm2int = glmer(binary_resp ~ speech*int + (1+speech*int|subject_nr),
                 data = d0,
                 fam = binomial("logit"),
                 control = glmerControl(optimizer = "bobyqa"))


You code works on a toy data set, as shown below.

DF <- data.frame(CAT = c("A","B","A","C"), stringsAsFactors = TRUE)
#> [1] A B A C
#> Levels: A B C
DF$CAT <- as.numeric(DF$CAT) - 2
#> [1] -1  0 -1  1
DF$CAT <- scale(DF$CAT, scale = FALSE)
#>       [,1]
#> [1,] -0.75
#> [2,]  0.25
#> [3,] -0.75
#> [4,]  1.25
#> attr(,"scaled:center")
#> [1] -0.25

Created on 2023-05-10 with reprex v2.0.2

What is the result of


using the original values of the column, before you use as.numeric()

Thanks for your reply.

Hmm......I'm a bit stumped then.

The result of str(d0$interference) before using as.numeric() is chr [1:2160] "none" "none" "none" "none" "none" "none" "none" "none" "none" "none" "none" ...
Basically a third of those is 'none', another third 'foot-tap' and the final third 'whisper'.

The column named interference is not a factor, so as.numeric() is returning NA. You can make it a factor with

d0$interference <- factor(d0$interference, levels = c("none", "foot-tap", "whisper"))

then proceed with the code you had.

1 Like

This worked - thank you!
I actually used as.factor() (creature of habit!) - not sure there's any difference though?

Anyway, thanks again. :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.