Employee Churn Prediction "Outlier Decision"

Hi im working on a Employee Churn Prediction and I have a question with de Salary outliers:
I have a database of 960 values, and the salary outliers are 110 values. So my question is directed, what decision to make, because I don't want to eliminate the outliers because they are a big part of the total, but at the same time if I don't eliminate them and leave them there they can negatively affect my prediction model. At the same time, if I modify them like replacing them with the average, it should also affect me in the final results, because I am not working with the original data.
What would you recommend me to do in this case, in advance thank you very much

What makes you think those 110 observations are "outliers"? Maybe you have a salary distribution with a long tail(s). Is there a missing variable that help explain why 10% of the population has such different salaries? I would try not to discard or substitute just yet.

1 Like

Gotta tell ya, if over 10% of your data is "outliers", then they ain't outliers. That's just your data.

Try transforming the data, for example using a log transform, and they will probably not look like "outliers" anymore.

1 Like

it might be that you need one model for the Majority population and another for your top performers.

1 Like

Yeah maybe it's because of the job they're working on. But i did the boxplot look:

Thank´s mate i will try that

But only because they are marked as outliers in an boxplot (meaning they are outside a range of 1.5 times the IQR) doesn't tell you you should remove them from your data set. It is an important information and as @phil_hummel already pointed out there might be a fgood and maybe even interesting explanation for that that might be worth to investigate further instead of ignorign those.

