Tables with weighted proportions (svydesign + tbl_svysummary), what am I doing wrong?

juulia · August 31, 2021, 5:48pm

Hi,

It's my first post here, hopefully I'm not asking something super stupid. I'm rather new with R and having some trouble applying weights for my analysis. I'm using National Survey on Drug Use and Health 2019 dataset.

I want to create tables which include the weighted proportions for variables in wdata (risktaking, martialstatus, gender, education, agegroups, ethnicity, income, sort by MDMA use).

I tried tbl_svysummary, but obviously I'm doing something really wrong because the n in this dataset after excluding minors is around 40 000 but now in the table I'm getting from this the n goes up to hundreds of millions, makes no sense.

Is there anyone who could help me? I've been trying to figure this out by myself but ugh, I just seem to bump into more problems every time I try something.

Code:

wdata <- NSDUH_adults %>%
select(risktaking,
maritalstatus,
gender,
education,
agegroups,
ethnicity,
income,
MDMA_use,
ANALWT_C,
VESTR,
VEREP)

options(survey.lonely.psu="adjust")
wdata2 <- svydesign(ids = ~VEREP,
strata = ~VESTR,
data = wdata,
weights = ~ANALWT_C,
nest=TRUE)

wdata2 %>%
tbl_svysummary(
by = MDMA_use,
include = c(gender, risktaking, MDMA_use),
label = list(gender ~ "Gender",
risktaking ~ "Risktaking")
) %>%
add_p() %>% # comparing values by "both" column
add_overall() %>%

adding spanning header

modify_spanning_header(c("stat_1", "stat_2") ~ "MDMA")

StatSteph · August 31, 2021, 8:43pm

These estimates are weighted estimates. If you look at the sum of the weight variable for your data (wdata), what is it? There are about 250 million adults in the US so I expect the weighted totals to be in that ballpark.

I'm not familiar with the gtsummary package but I use the srvyr package for survey data analysis. You might want to take a look here at a tutorial I made for that: GitHub - szimmer/tidy-survey-aapor-2021: Tidy Survey Analysis in R using the srvyr Package: AAPOR

juulia · September 1, 2021, 6:17am

Thanks! Yeah actually I realized after writing this post that I should use only percentages. So probably there's nothing wrong with the code, it's just that I should make the table so that it gives me only the percentages.

Does anybody know how to do that?

StatSteph · September 1, 2021, 1:42pm

Change the statistic in tbl_summary as follows:

   tbl_svysummary(
      by = MDMA_use,
      include = c(gender, risktaking, MDMA_use),
      label = list(gender ~ "Gender",
                   risktaking ~ "Risk taking"),
      statistic=list(all_categorical()~"{p}%" ) ########
   )

juulia · September 2, 2021, 10:41am

Thank you so much! That solved it!

system · September 23, 2021, 10:42am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.