I would like it to only try different penalty and mixture values of glmnet, so I tried resamples = NULL and removing the argument without success. How can I make it ignore resamples argument?
I should rephrase: which rows of the data do you want for prediction and modeling?
tidymodel avoids predicting on data that was used to fit a model since it can easily lead to overfitting. Almost all of our resampling objects (which are required here) separate the two data sets.
Oh I see -- df is already the train data without test data (let's say df_train and df_test). I would like it to fit on all data (train + test, say df_all). Please let me know if that answers your question.
Sorry, before your questions, I guess I didn't quite understand the relevance, but I should have anticipated this. It makes a bit more sense why NULL resamples argument doesn't work.
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
tidymodels_prefer()
theme_set(theme_bw())
# best: subsplit the training data to use a validation set
set.seed(1)
resample <- validation_split(mtcars)
resample
#> # Validation Set Split (0.75/0.25)
#> # A tibble: 1 × 2
#> splits id
#> <list> <chr>
#> 1 <split [24/8]> validation
# not great: put the training set in both training and test.
# we'll do this via bootstrapping with an extra option
set.seed(1)
resample <-
bootstraps(mtcars, times = 1, apparent = TRUE) %>%
filter(id == "Apparent")
resample
#> # A tibble: 1 × 2
#> splits id
#> <list> <chr>
#> 1 <split [32/32]> Apparent
res <-
linear_reg() %>%
# Same code works for tune_grid
fit_resamples(mpg ~ ., resample = apparent(mtcars))
I'm just discovering that when using vfold_cv() or sliding_window() etc. instead of bootstrap with the same code, the output is model weights, not covariate weights. I guess only some resampling methods or only bootstrap returns covariate weights. Bit of a bummer, but oh well, bootstrapping it is.
If you want the coefficients from the fitted models, you can get those by extracting the models. I.e. something like the control = control_grid( extract = function (x) x) ) in tune_grid (if this gets too big, I think you could butcher what you return further to only keep the coefficients). Then you can look at tune_results_object$.extracts and process that further. Or is that not what you meant?
But wait! We know that each glmnet fit contains all of the coefficients. This means, for a specific resample and value of mixture , the results are the same:
all.equal(
# First bootstrap, first `mixture`, first `penalty`
glmnet_res$.extracts[[1]]$.extracts[[1]],
# First bootstrap, first `mixture`, second `penalty`
glmnet_res$.extracts[[1]]$.extracts[[2]]
)
#> [1] TRUE
I've been trying to figure out the reason for the lack of change with different values of penalty as mentioned above, but I haven't had much luck. Why would different values of penalty result in the same coefficients? In the last graph, it seems to show different values for different penalties: https://www.tidymodels.org/learn/models/coefficients/figs/glmnet-plot-1.svg
glmnet models produce coefficients for all value of the penalty for each model fit. The infrastructure in tidymodels gives a row for each penalty but the tidy method produces coefficients for all of them, so there will be replicate values. Take a look at this document for more details.