Hi,
I am attempting to find the 95% CI of the c-statistic for ROC curves using bootstrapping. However, I keep getting the following error when I run this code:
FUN = function(Excluded_data, i){
fit = glm(LOS_quartiles ~ PH_Z + AGE_SURGERY + SEX + RACE + ETHNICITY + MARITAL_STATUS +INSURANCE, data = Excluded_data[i,], family = "binomial")
DescTools::Cstat(fit)
}
res = boot(Excluded_data, FUN, R=999)
boot.ci(boot.out = res, type = "perc")
Error: contrasts can be applied only to factors with 2 or more levels.
However, all of the factors and continuous variables (AGE_SURGERY and PH_Z) have more than 1 unique value and I have filtered out NA values. How can I resolve this issue?
Any suggestions would be much appreciated. Thanks in advance.
please provide reproducible code.
some bootstrap samples may only have one factor levels unless you stratify / guard against it.
Also how big is you sample size ?
Hi, thanks for the reply. I have updated the post with formatted code. The 'Excluded_data' variable is the dataset, and the variables in the glm command are either continuous (PH_Z, AGE_SURGERY) or categorical (all others). They all have more than 1 unique value when I check with the 'unique' function. Some variables have 2 levels, being 0 or 1.
I cannot help if I cannot reproduce on my end so you should at a minimum produce a minimal data that reproduce your error or use an R built-in data that shows the problem
see here it might be NA or something else going on in your fitting
Apologies, I have used the 'write.csv' function to convert the data into a vector format - the first row represents the variable names and each column of values represents the data for that variable. The 'Excluded_data_variable' includes all the required variables for the above code apart from the 'PHZ variable' which I have uploaded separately. That is why there are 2 files.
if phz that is referenced in your glm comes from a different data set than the rest of the data sex, race etc; . i.e. you have a method of combining the data in your two files to make a single Excluded_data, which seems to be the sole basis of input to the code you want help with; it would make more sense to me that you simply use saveRDS() on your excluded_data, and share that one dataset.