I am working on a study that observes the impact of a nutritional label on the food choices of students in a canteen. Each student chooses a starter, a main course and a dessert to constitute their meal. And for each category there are different choices with nutritional scores A, B, C, D or even E. We built big bayesian regression models using bayesglm in which we check for each food item the effect of participants-related factors (age, gender, BMI, etc.) and the effect of the choice of other items on the same plate, all this in the presence of the nutritional label (as we also have data where the participants didn't have the nutritional label displayed during the experiment).
The function I call in RStudio looks like this: bread <- bayesglm(bread~labeldisplay*(gender+scholarship+age+BMI+hunger+starter_A+starter_B+starter_C+starter_D+main_A+main_B+main_C+main_D+dairy_A+dairy_B+dairy_C+dairy_D+dessert_A+dessert_B+dessert_C+_D+dessert_E), family="binomial")
In simpler models, I usually sum the resulting estimates along with the intercept's together and then use inv.logit to find probabilities. In this case, I have a big model with some continuous variables like age and BMI and I also need to take into consideration that each meal has many other components. How do I calculate probabilities out of estimates in such a model ?
Thank you for taking the time to read my question !
Thank you for your answer. But when I use predict(), there are a thousand resulting values. What I'm searching for is the probability of choosing bread when a starter A has already been chosen for example.
Thanks again for taking the time to read my question.
I think you would construct a dataframe where the Independent variables represent the example(s) you want to predict over. If that's just one observation you would get one result.
You could try to automate the process of selecting variable combinations that you want to predict at. Here's an example with some built-in data. To create the prediction data frame pred, we get three quantiles for each of the numeric columns and all of the unique categories for the categorical columns. We then combine these into a data frame containing every combination of these values and run predict to get the probabilities.
library(arm)
library(vcd) # For Baseball data
library(tidyverse)
# Get a data subset for modeling
d = Baseball %>%
select(league86, league87, atbat86, homer86, posit86) %>%
filter(posit86 %in% c("1B","2B","C"))
m1 = bayesglm(league87 ~ atbat86 + homer86 + posit86 + league86, data=d,
family=binomial,
drop.unused.levels=FALSE)
# Create data frame with desired combinations of independent variables then add predictions
pred = d %>%
select(-league87) %>%
map(
~if(is.numeric(.x)) quantile(.x, prob=c(0.25, 0.5, 0.75)) else unique(.x)
) %>%
expand.grid() %>%
mutate(predicted = predict(m1, ., type="response"))
head(pred)
#> league86 atbat86 homer86 posit86 predicted
#> 1 N 255.00 4 2B 0.89825874
#> 2 A 255.00 4 2B 0.07354152
#> 3 N 371.00 4 2B 0.92800690
#> 4 A 371.00 4 2B 0.10385790
#> 5 N 515.75 4 2B 0.95385433
#> 6 A 515.75 4 2B 0.15672025