Hello, I am stuck with the weights of a survey.

I have a vector with the weights of each observation. In this case how many people each observation represent. In the other hand I have the data of each variable. To see properly all the values I should weight the value of each variable for the weight vector (as simple as a multiplication).

But things get complicated when I want to graphic some results like an histogram or a density plot because I dont know how to specify that after the multiplication (variable * weight) the new observation should represents more than one person in the histogram. Any help?

`geom_histogram()`

accepts a `weight =`

aesthethic ... see https://ggplot2.tidyverse.org/reference/geom_histogram.html

If you want more help than this, you'll really have to post some example code.

Thanks. Lets supouse these are my data

```
library(tidyverse)
my_data <- tibble(Var_1 = c(900, 1500, 350, 1200, 750, 100),
my_weights = c(2.2, 3.1, 8.2, 4.2, 5.3, 6.8))
```

The correct way to proceed would be use the "raw" data to create the histogram like here?

```
ggplot(my_data, aes(Var_1, weight = my_weights,
))+
geom_histogram()
```

Or should I create a new weighted variable first and then use it in the plot like here:

```
my_data %>%
mutate(Var_1_weighted = Var_1 * my_weights) -> my_data
ggplot(my_data, aes(Var_1_weighted, weight = my_weights,
))+
geom_histogram()
```

You should do the first way. When using weights, we say that observation represents that many cases which is what the weight statement does. Imagine the variable is height, say 2 meters, and the weight is 3. It means the observation represents 3 2-meter people not 1 6-meter person which is what you are doing when you multiply by the weight.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.