That's perhaps a bit a hypothetical question but I came across this recently at work and don't really have a clear answer in mind.
The dataset is small and highly imbalanced. We have a many numerical variables and one categorical one that specifies the type of the business with four distinct categories. Only one of them (let's call it category X) is relevant for future predictions since the product is tailored only for category X. It is worth noting that there are significant differences in predictive power across levels of that categorical variable. The best approach would involve including only category X in the training sample, however, due to the high imbalance there would not be enough target variable top class observations and the entire set gets really small. We eventually decided to train two models on the entire set (including all levels of the categorical variable): one including the categorical variable and one excluding it.
At the moment we have two models:
 With the categorical variable (called A)
 Without the categorical variable (called B)
Model A has slightly better performance than model B because the categorical variable was generally relevant to the problem, however, we should also consider it's generalization power with regards to the specific type of clients it's built for.
The main question is: which model is more suited for this given problem?

Is it model A because of including the categorical variable and accounting for differences in performance across levels of the categorical variable?

Is it model B because of not including the categorical variable and making it more suitable for future predictions (categorical variable will always have one level only) which could lead to better generalisation?
I realize that my question is a bit hypothetical but I'm not able to disclose more information. I would be very grateful if you could share your thoughts or articles that would help us make a decision. I'm not giving my preference at the moment not to bias your answers Thank you!