Using weights of survey. svydesign ()

Lucas11 · April 16, 2020, 12:28pm

Hi, I have a large dataset with the weight of each observation of a survey. I´d like to use these weights to manipulate the data properly and then create a histogram or a density plot.
I know I have to use the svydesign () function, but I am not sure of the paramaters I must use.
I have:

My_data_frame
The column weights (with values between 2.2 and 8.2 for the 20.000 rows),
The rest of the columns (same number of rows)

Any suggestion of how to start? Thanks in advance

StatSteph · April 23, 2020, 12:54pm

Using svydesign from the package survey does more than incorporate weights, it also incorporates the sampling design. Generally in the survey data documentation, you can find out what the sampling design was and how to estimate variances using the PSUs, strata, or replicate weights. Those are part of what goes into the svydesign function.

Take a look here: http://r-survey.r-forge.r-project.org/survey/

Perhaps you can share with us what the data is like and maybe what survey and then I can help out more.

Lucas11 · April 28, 2020, 9:59am

Thanks, I've been checking the guide.
The problem I found is that all the exercises and examples are made to design a representative sample an then use it. I need a different approach, I have the sample designed with their respective weights. In this case they are statistics of the income tax. And I want to use functions like svytable, svyplot for my analisys.
I think it should be simple, but the only thing I know about my sample is the weight vector and that the sample size is calculated for an error, in the mean of the income variable, less than 3% with a confidence level of 3 per thousand.
I am not sure how to proceed to tell R the survey design in this case

StatSteph · April 28, 2020, 11:45am

What survey is this? I'm not sure I understand your statement "sample size is calculated for an error, in the mean of the income variable, less than 3% with a confidence level of 3 per thousand" You could assume it is a simple random sample if you really don't know anything else. Note the survey design only impacts variances, standard errors, and test statistics, it doesn't impact point estimates like means and totals. Here's an example using the srvyr/survey package:

library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#> 
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#> 
#>     dotchart
library(tidyverse)
library(srvyr)
#> 
#> Attaching package: 'srvyr'
#> The following object is masked from 'package:stats':
#> 
#>     filter

# In this example, the data apiclus1 has a weight of pw
# We will pretend it is from a simple random sample

# https://cran.r-project.org/web/packages/srvyr/vignettes/srvyr-vs-survey.html

data(api) # this loads in some example data from survey package

my_des <- apisrs %>%
  as_survey_design(ids=1, #no cluster
                   weights=pw
                   )

# Making a frequency table using a survey object
my_des %>%
  group_by(awards) %>%
  summarize(proportion=survey_mean(),
            total=survey_total())
#> # A tibble: 2 x 5
#>   awards proportion proportion_se total total_se
#>   <fct>       <dbl>         <dbl> <dbl>    <dbl>
#> 1 No           0.38        0.0344 2354.     213.
#> 2 Yes          0.62        0.0344 3840.     213.
  

svyplot(api00~api99, design=my_des)

^{Created on 2020-04-28 by the reprex package (v0.3.0)}

Lucas11 · May 6, 2020, 8:06am

Thanks for your help I was I bit confused regarding the idea of how to use the weights of my data because I didn't have more info about the sample but I think this was the key:
" You could assume it is a simple random sample if you really don't know anything else."

system · May 13, 2020, 8:06am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.