"Raw" and "Prob" differences in predict() for "nnet" package, tidymodels

I am training a classification "nnet" model via the "tidymodels" packages.
When using the predict function I get differences between the "raw" and "prob" types.

#install.packages("nnet") #v7.3-17
#install.packages("tidymodels") #v0.1.4

library(tidymodels)

n = 1000
set.seed(123)
df = tibble(var_1 = runif(n),
            var_2 = runif(n),
            var_3 = rnorm(n), 
            label = factor(sample(c("accept","reject"), n, replace = TRUE)))

df_split = rsample::initial_split(df, prop = (3/4)) 

mlp_recipe <- recipe(label ~ ., data = df) 

model_spec = mlp() %>%
  set_mode("classification") %>%
  set_engine("nnet")

model_workflow = workflow() %>% 
  add_recipe(mlp_recipe) %>%
  add_model(model_spec)

model_fit <- model_workflow %>% 
  fit(data = training(df_split))

#difference in predictions

prob_pred = predict(model_fit, training(df_split), type = 'prob') %>% #notice type = "prob
  tibble() %>%
  select(1)%>%
  pull()   

raw_pred = predict(model_fit, training(df_split), type = 'raw') %>% #notice type ="raw"
  as.vector()

bind_cols(prob_pred = prob_pred, raw_pred = raw_pred) %>%
  mutate(sum = prob_pred + raw_pred) %>% #the sum doesnt add to 1 discarding that I am looking at the 1-p case.
  head()

raw is not the probability, its an intermediate output, I don't know what the exact translation is or how it comes about, but playing about with it in this way I think shows the principle. (anyway short answer is if you want probability ask for it with 'prob' as you have done, and simply ignore 'raw')


(both <- bind_cols(prob_pred = prob_pred, raw_pred = raw_pred) )

(raw_to_prob <- lm(prob_pred ~ poly(raw_pred,3), data=both))
both$pred_from_raw <- predict(raw_to_prob,newdata = both)
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.