As the title of this post suggests, I am asking about strategies regarding data spending when developing models on small datasets.
In my area of work it is not uncommon for us to run a study on a niche topic where it is either very difficult or very expensive for us to find participants (maybe only 75-150 records or so).
In these situations the primary purpose of modeling is to understand the effects the predictors have on the outcome, and not to deploy a predictive model. That being said, I understand the importance of building a model that generalizes well, even if that isn't the primary goal.
Can anyone suggest best practices, strategies, or resources on the topic of data spending (e.g., train/test splits, cross-validation, etc.) with small data?
Thank you, this makes sense. In light of your suggestion do you ever think there is right time and place for leave-one-out cross validation? I rarely (ever?) use it for fear of an overoptimistic measurement of accuracy, and since it will only produce one measurement rather than a distribution.