- When I use a traditional model object in the
augment()
, it returns:
- the predictions in
.fitted
column - depending on whether
new_data
ordata
arguments were provided we'd also get:.resid
(&.std.resid
,.hat, .sigma
,.cooksd
. - it would also compute intervals when the
interval
element is provided.
fit_trad <- lm(mpg ~ wt, data = mtcars)
augment(fit_trad, data = mtcars)
- However, if you provide it with a model wokflow object:
- This time you'll only get the prediction columns and with a different name:
.pred
- Augment would not accept
data
argument and accepts only thenew_data
(ornewdata
?! according to the help page) argument. - Providing the
interval
argument doesn't seem to do anything.
rec <-
recipe(mpg ~ wt, data = mtcars)
spec_lm <-
linear_reg() %>%
set_engine("lm")
wf <-
workflow() %>%
add_recipe(rec) %>%
add_model(spec_lm) %>%
fit(data = mtcars)
augment(wf, new_data = mtcars, interval = "confidence")
- Now if you feed it a Parsnip object:
- results are almost similar to providing the workflow object,
- except this time, you'd get the
.resid
column too but not the.std.resid
- Unlike the using the wf, here you'll have to apply the recipe to the new_data the separately. (I just realized this in my original code)
parsnip <- wf %>% extract_fit_parsnip()
augment(parsnip, new_data = mtcars, interval = "confidence")
- So say, I want to still use tidymodels approach but get the results produced had I provided traditional model fit, I'd extract the model fit from the parsnip object and then plug it into the augment.
This would yield the confidence interval but:
- Other columns of the original data are removed!
-
Sometimes (despite plugging in the training data) this approach won't give you the
.std.resid
- The outcome column name is changed to
..y
fit_new <- wf %>% extract_fit_engine()
augment(fit_new , new_data = mtcars, interval = "confidence")
This is so confusing. I expected to just plug in the workflow object and new_data and get exactly what I would have got had I plugged in the traditional model fit.
P.S: the newdata
/ new_data
argument is also confusing. the help document says augment()
argument is newdata
. but actually:
- If you are using the tidymodel objects you should use
new_data
- If you are using a traditional model (example 1) it's
newdata
. and in this case if you don't provide it (or mistakenly plug innew_data
), thenaugment()
would silently use thedata
used for the model fit.