Let's say, I have a data set called D with n rows and m columns. In the data set there are some categorical variables. Let's say, I need to analyze some variable respect to the categorical variables. How do I remove the outliers from the entire data set? I tried to use rm.outlier() from the outlier package, but it isn't working as I want, due the fact that it returns a new array, instead of removing the entire row where the outlier is.
I'll prefix what I really want to say with an initial comment that an outlier is a somewhat subjective and context dependent notion...
Having gotten that out of the way, I'd like to ask, how you came to this idea of removing columns that contain outlying values ? it seems like a recipe for removing all your columns ...
Ok, thats fine.
Do you have any particular definition of outlier that makes sense for your context (which you haven't shared with us yet) that you wish to apply ?
A value under the first quantile minus 1.5 the IQR or over the third quantile plus 1.5 times the IQR.
They are the dots drawed by boxplots, as I understand.