Pairwise proportions test with complex survey sample design

sgenter · April 2, 2024, 1:06pm

Posit Community,

I'm looking for a solution to conduct difference of proportions tests and pairwise difference of proportions tests with survey data. I have an example below the uses rstatix pairwise_prop_test() to give a general sense of what I'm aiming for, but this function won't work since it doesn't account for a complex sample design. The best source I could think of to solve this problem is the survey package, but I did not find a function to perform a difference of proportions test there. Any help would be greatly appreciated.

suppressPackageStartupMessages(
  suppressWarnings({
    library(tidyverse)
    library(srvyr)
    library(rstatix)
}))  
 
 # Minimal example
 set.seed(5)
 df <- data.frame(gender = rep(c("Female", "Male"), 50 ),
                  job_type = sample(c("type1", "type2", "type3"), 100, replace = TRUE),
                  smpl_wts = runif(100),
                  strat = sample(1:2, 100, replace = TRUE)) 
 
 # Survey design object
 df_design <- df %>% 
   as_survey_design(weight = smpl_wts,
                    strata = strat)
 
 # Here is an example using {rstatix} pairwise_prop_test that's in the
 # neighborhood of what I'd like to do, but the results are incorrect because
 # it does not account for the sample design
 df_design %>% 
   survey_count(gender, job_type) %>% 
   group_by(job_type) %>% 
   mutate(type_total = sum(n),
          percent_women = ((n/type_total) * 100) %>% round(1)) %>% 
   select(gender, job_type, n) %>% 
   pivot_wider(names_from = "gender", values_from = "n") %>% 
   column_to_rownames("job_type") %>% 
   pairwise_prop_test(p.adjust.method = "bonferroni")
#> # A tibble: 3 × 5
#>   group1 group2     p p.adj p.adj.signif
#> * <chr>  <chr>  <dbl> <dbl> <chr>       
#> 1 type1  type2  0.282 0.846 ns          
#> 2 type1  type3  0.854 1     ns          
#> 3 type2  type3  0.572 1     ns

^{Created on 2024-04-02 with reprex v2.0.2}

StatSteph · April 3, 2024, 9:49pm

Here's a solution which is specific to your provided example but could be functionalized with more time. Testing is done for pairs using the function svyttest() from the package survey. My solution doesn't add the column of "p.adj.signif" as I'm not quite sure how that's done in the documentation from pairwise_prop_test().

suppressPackageStartupMessages(
  suppressWarnings({
    library(tidyverse)
    library(srvyr)
    # library(rstatix)
    library(survey)
    library(broom)
  }))  

# Minimal example
set.seed(5)
df <- data.frame(gender = rep(c("Female", "Male"), 50 ),
                 job_type = sample(c("type1", "type2", "type3"), 100, replace = TRUE),
                 smpl_wts = runif(100),
                 strat = sample(1:2, 100, replace = TRUE)) 

# Survey design object
df_design <- df %>% 
  as_survey_design(weight = smpl_wts,
                   strata = strat) 

# This could be functionalized but is made for this very specific example

levs <- df_design[["variables"]]$job_type %>% unique()
pairs <- combn(levs, 2)

test_diff <- function(i){
  v1 <- pairs[1, i]
  v2 <- pairs[2, i]
  df_design %>%
    filter(job_type %in% c(v1, v2)) %>% # filter to group1 and group2 only
    mutate(Female=gender=="Female") %>% # make a logical variable for Female
    svyttest(Female~job_type, design=.) %>% # perform test
    tidy() %>% # use broom to tidy
    mutate(group1=v1, group2=v2) %>%
    select(group1, group2, p=p.value)
}

1:ncol(pairs) %>%
  map(test_diff) %>% # run for each pair
  bind_rows() %>% #stack the results
  mutate(p.adj=p.adjust(p, method="bonferroni"))  # calculate the adjustment
#> # A tibble: 3 × 4
#>   group1 group2      p p.adj
#>   <chr>  <chr>   <dbl> <dbl>
#> 1 type2  type3  0.261  0.784
#> 2 type2  type1  0.0763 0.229
#> 3 type3  type1  0.520  1

^{Created on 2024-04-03 with reprex v2.1.0}

sgenter · April 11, 2024, 7:50pm

This is great, I can definitely work with this. Thank you thank you.

system · April 18, 2024, 7:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.