Hi! I'm new to R and would like to winsorize my data since trimming is no option due to my limited number of observations.
My data looks like follow, in total I have 131 observations:
company id rev size age
1 Adeg 29.9 0.66 160 45
2 Agrana 32.0 2.80 9191 29
3 Allianz 36.5 87.75 142460 128
4 Andritz 34.0 6.89 29096 118
5 Apple 41.0 259.65 132000 41
i would like to winsorize the variable " rev ", so actually winsorize my data by column (not the whole dataset). If i use the function winsorize(data1), an error appears since variable company is not numeric.
How can i winsorize the data by column?
To avoid this problem I created a second dataset, data2 without variable company. By applying function winsorize(data2), another error appears again:
winsorize(data2)
Error in eigen(R, symmetric = TRUE) : infinite or missing values in 'x'
How can I winsorize my data correctly?
I would be very happy if someone could help me out!
You can replace the data set and variables with your own. Note: I assumed you were using the Winsorize() function from the DescTools package, because you didn't specify
I recommend consulting the help document for the winsorize function with ?winsorize. This is a good starting point when using a function whereby you may not know how to use it or what it returns. There is a section titled Value which describes what the function returns.
If standardize is TRUE and return is "weights", a set of data cleaning weights. Multiplying each
observation of the standardized data by the corresponding weight yields the cleaned standardized
data.
Otherwise an object of the same type as the original data x containing the cleaned data is returned.
So depending on how you call the function, you either get the weights, or the cleaned data. Based on how you called the function in the above code, those values are the cleaned data (i.e. the original data but with outliers shrunk).
thank you! @mattwarkentin
so i also tried the winsorize function from the DescTools package. Since I have generated a single vector with my variable of interested, i tried to run the function as follow: Winsorize(rev_vector), where rev_vector is numeric.
R gives me following error:
The issue is that Winsorize() only accepts a single numeric vector as its first argument. A data frame with only one variable is still a data frame, not a vector. So based on the error you are getting, it seems like your single variable is still a variable in a data frame. You will need to extract the vector from the data frame like so:
DescTools::Winsorize(rev_vector$rev) # where rev is the name of the variable
Or use an approach that ensures the function is ran inside the context of the data frame. Here are two such examples:
# Base R approach
with(rev_vector, DescTools::Winsorize(rev))