Winsorize Data, Results and Quantiles Question

Hey there,

I am new to R and want to winsorize my data. I would like to use the function "winsorize" 5%,95%, which is included in robustHD package, but i am wondering about the results. Why is the Min. still 14? And how can I change the results to percentiles of 10% and 90%?
I tried to change function "robs = c(0.05, 0.95)" to "robs = c(0.10, 0.90), but the results were the same.

summary(Data)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   14.0    59.0   139.5   250.4   315.0  1743.0 
> Data.Robust= winsorize(Data, minval = NULL, maxval = NULL, probs = c(0.05, 0.95),  na.rm = FALSE, round(5))
> Data.Robust=round(Data.Robust, digits = 0)
> summary(Data.Robust)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   14.0    59.0   139.5   186.4   315.0   418.0 

Thank you very much

Welcome to the community!

After winsorization, minimum (or maximum) observation can remain unchanged, and that's not necessarily wrong. If there multiple minimum (or maximum) observations in the original data, and their proportion is more than the proportion that you're substituting, then it can definitely happen. What you're substituting are the extreme observations. But that's not enough to ensure that the value of the minimum or maximum will be changed. See below:

library(DescTools)

x <- c(0, 0, 0, 0, 1, 1, 3, 4, 6, 7, 9, 9, 10, 10, 10, 10)

summary(object = x)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00    0.75    5.00    5.00    9.25   10.00

(y <- Winsorize(x = x, 
                probs = c(0.1, 0.9))) # extemere example, as y remains exactly same as x
#>  [1]  0  0  0  0  1  1  3  4  6  7  9  9 10 10 10 10

summary(object = y)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00    0.75    5.00    5.00    9.25   10.00

Created on 2019-07-21 by the reprex package (v0.3.0)

So, in that case, you may like to use higher (or lower) quantiles. Maybe, you can use c(0.3, 0.7). Whether you want to do that or not, or whether that is justifiable or not, that'll probably require much more domain knowledge.

Hope this helps.

PS Just a quick question. Are you sure you are using winsorize from robustHD package? The function in that package has different arguments, while your arguments match with the function provided in DescTools.

1 Like

Hi Yarnabrina,

thanks for your support :slight_smile: I unterstand the Min/Max value now.

yes I used the robustHD package, but I also tried Desctools now and got different results. What is the difference between these two? The DescTool package seems to be the right one

> #robustHD
> summary(winsorize(Data, probs = c(0.05, 0.95)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   14.0    59.0   139.5   186.4   315.0   418.2 
#DescTools
> summary(Winsorize(Data, probs = c(0.05, 0.95)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   23.0    59.0   139.5   229.3   315.0   770.4 
>

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.