Winsorize Data, Results and Quantiles Question

Welcome to the community!

After winsorization, minimum (or maximum) observation can remain unchanged, and that's not necessarily wrong. If there multiple minimum (or maximum) observations in the original data, and their proportion is more than the proportion that you're substituting, then it can definitely happen. What you're substituting are the extreme observations. But that's not enough to ensure that the value of the minimum or maximum will be changed. See below:

library(DescTools)

x <- c(0, 0, 0, 0, 1, 1, 3, 4, 6, 7, 9, 9, 10, 10, 10, 10)

summary(object = x)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00    0.75    5.00    5.00    9.25   10.00

(y <- Winsorize(x = x, 
                probs = c(0.1, 0.9))) # extemere example, as y remains exactly same as x
#>  [1]  0  0  0  0  1  1  3  4  6  7  9  9 10 10 10 10

summary(object = y)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00    0.75    5.00    5.00    9.25   10.00

Created on 2019-07-21 by the reprex package (v0.3.0)

So, in that case, you may like to use higher (or lower) quantiles. Maybe, you can use c(0.3, 0.7). Whether you want to do that or not, or whether that is justifiable or not, that'll probably require much more domain knowledge.

Hope this helps.

PS Just a quick question. Are you sure you are using winsorize from robustHD package? The function in that package has different arguments, while your arguments match with the function provided in DescTools.

1 Like