Hi Posit Community.
I'm currently evaluating case weights in tidymodels, and my review of the publicly available documentation has led to the following questions:
- I've used
show_model_info()
thus far to investigate individual model types and their ability to support case weights. However, is there a comprehensive document that outlines all models that support case weights? Rather than checking each model type individually, I wanted to make sure I wasn't missing a resource that outlines all model types that support case weights. - Secondly, how do
importance_weights()
impact model estimation? I haven't been able to uncover this detail yet as much of the documentation I've reviewed describes why case weights should be used. What I'm most interested in is how weights impact the model construction process. I assume the implementation may change slightly by model type, but what is happening under the hood withimportance_weights()
? For example, how do observations with higher weight values impact splits in tree-based methods? - Lastly, the answer above will likely influence this question, but what advantages are there to leveraging observations with low case weights? Depending upon how
importance_weights()
are used during the model construction process, I'm wondering if it would be wiser to simply drop the observations with low weights.
For added context, the data I am working with is still developing. That is, there is a time component similar to what is outlined here. The importance weights we plan to use would emphasize fully developed observations and minimize the impact of newer records. In summary, case weights would allow our data science team to leverage as much data as possible while also emphasizing observations that have fully matured.
Any thoughts from the community and/or the contributors to tidymodels would be greatly appreciated.
Thanks,
Mitch