Uriah
December 10, 2024, 8:59am
1
Hey all.
I'm working on a prediction model in production and facing some tradeoffs.
I'm using logistic-regression, that tends to be heavy on workflows objects and makes the whole flow more complicated. One solution provided is to use { LiblineaR } which seems reasonable.
But I want to leverage { tidypredict }, problem is that it does not support {LiblineaR} and I'm not sure if it works properly with {recipes} etc...
Any thoughts?
Thanks!
Max
December 10, 2024, 3:34pm
2
Have you used the butcher package on the fitted model? That can slim it down substantially.
There is also tidypredict's successor, orbital . I don't think that it supports LiblineaR but has a lot of other features that might help you out.
Uriah
December 10, 2024, 4:10pm
3
I did try to use butcher, but it doesn't really change a lot for glm models as discussed here:
opened 11:04PM - 29 Aug 22 UTC
closed 06:00PM - 31 Aug 22 UTC
I'm no longer able to reproduce the results shown in https://github.com/tidymode… ls/butcher/pull/212, despite no changes in the `glm.R` file.
I found this problem when trying to answer this SO question https://stackoverflow.com/questions/73529453/file-size-of-tidymodels-workflow
``` r
library(butcher)
more_cars <- mtcars[rep(1:32, each = 1000),]
cars_glm <- glm(mpg ~ ., data = more_cars)
weigh(cars_glm)
#> # A tibble: 63 × 2
#> object size
#> <chr> <dbl>
#> 1 qr.qr 5.36
#> 2 y 2.80
#> 3 residuals 2.80
#> 4 fitted.values 2.80
#> 5 linear.predictors 2.80
#> 6 weights 2.80
#> 7 prior.weights 2.80
#> 8 effects 0.513
#> 9 model.mpg 0.256
#> 10 model.cyl 0.256
#> # … with 53 more rows
#> # ℹ Use `print(n = ...)` to see more rows
butchered <- butcher(cars_glm)
sum(weigh(cars_glm)$size)
#> [1] 28.325
sum(weigh(butchered)$size)
#> [1] 19.91117
weigh(butchered)
#> # A tibble: 53 × 2
#> object size
#> <chr> <dbl>
#> 1 qr.qr 5.36
#> 2 residuals 2.80
#> 3 linear.predictors 2.80
#> 4 weights 2.80
#> 5 prior.weights 2.80
#> 6 effects 0.513
#> 7 model.mpg 0.256
#> 8 model.cyl 0.256
#> 9 model.disp 0.256
#> 10 model.hp 0.256
#> # … with 43 more rows
#> # ℹ Use `print(n = ...)` to see more rows
```
<sup>Created on 2022-08-29 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>
I'll try to use update_role()
instead of my old "outcome~." syntax but I don't think it would make a big difference.
Thing is that I need to save the model object somehow. My company uses MLflow, it's kind of weird to find out that linear models are way heavier than Python's version of lightGBM.
My alternative approach is just to save the model's object on my docker image and keep track of the coefficients and the original names of training data on our DB.
Should be deterministic I guess.
I'll try to use orbital as well. Thanks!