Include tidypredict as part of tidymodels workflow for glm models

Uriah · December 10, 2024, 8:59am

Hey all.

I'm working on a prediction model in production and facing some tradeoffs.

I'm using logistic-regression, that tends to be heavy on workflows objects and makes the whole flow more complicated. One solution provided is to use { LiblineaR } which seems reasonable.

But I want to leverage { tidypredict }, problem is that it does not support {LiblineaR} and I'm not sure if it works properly with {recipes} etc...

Any thoughts?

Thanks!

Max · December 10, 2024, 3:34pm

Have you used the butcher package on the fitted model? That can slim it down substantially.

There is also tidypredict's successor, orbital. I don't think that it supports LiblineaR but has a lot of other features that might help you out.

Uriah · December 10, 2024, 4:10pm

I did try to use butcher, but it doesn't really change a lot for glm models as discussed here:

github.com/tidymodels/butcher

glm methods appears broken on dev

opened 11:04PM - 29 Aug 22 UTC

closed 06:00PM - 31 Aug 22 UTC

EmilHvitfeldt

I'm no longer able to reproduce the results shown in https://github.com/tidymode…ls/butcher/pull/212, despite no changes in the `glm.R` file. I found this problem when trying to answer this SO question https://stackoverflow.com/questions/73529453/file-size-of-tidymodels-workflow ``` r library(butcher) more_cars <- mtcars[rep(1:32, each = 1000),] cars_glm <- glm(mpg ~ ., data = more_cars) weigh(cars_glm) #> # A tibble: 63 × 2 #> object size #> <chr> <dbl> #> 1 qr.qr 5.36 #> 2 y 2.80 #> 3 residuals 2.80 #> 4 fitted.values 2.80 #> 5 linear.predictors 2.80 #> 6 weights 2.80 #> 7 prior.weights 2.80 #> 8 effects 0.513 #> 9 model.mpg 0.256 #> 10 model.cyl 0.256 #> # … with 53 more rows #> # ℹ Use `print(n = ...)` to see more rows butchered <- butcher(cars_glm) sum(weigh(cars_glm)$size) #> [1] 28.325 sum(weigh(butchered)$size) #> [1] 19.91117 weigh(butchered) #> # A tibble: 53 × 2 #> object size #> <chr> <dbl> #> 1 qr.qr 5.36 #> 2 residuals 2.80 #> 3 linear.predictors 2.80 #> 4 weights 2.80 #> 5 prior.weights 2.80 #> 6 effects 0.513 #> 7 model.mpg 0.256 #> 8 model.cyl 0.256 #> 9 model.disp 0.256 #> 10 model.hp 0.256 #> # … with 43 more rows #> # ℹ Use `print(n = ...)` to see more rows ``` <sup>Created on 2022-08-29 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

I'll try to use update_role() instead of my old "outcome~." syntax but I don't think it would make a big difference.

Thing is that I need to save the model object somehow. My company uses MLflow, it's kind of weird to find out that linear models are way heavier than Python's version of lightGBM.

My alternative approach is just to save the model's object on my docker image and keep track of the coefficients and the original names of training data on our DB.

Should be deterministic I guess.

I'll try to use orbital as well. Thanks!

system · March 10, 2025, 4:10pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.