Sample size calculation for two independent proportions

MetinBulus · December 15, 2023, 7:34am

Hi @QueryingQuagga , your results are correct.

The arcsine transformation plays a crucial role in stabilizing the variance of the proportion difference. Consequently, it ensures a symmetric power function, yielding consistent kappa values on both the left and right. Results will be the same for different proportions that yield the same Cohen's h.

In contrast, the absence of the arcsine transformation results in an asymmetric power function, leading to disparate kappa values on the left and right. Results will diverge for different proportions that yield the same Cohen's h.

Probably you would want to use arcsine transformation for proportions towards extreme (0 and 1). Both approaches require large sample size for consistent estimates though. Essentially they are both z tests.

Here is a compact version of the code:

> library(pwrss)
> library(pwr)
> 
> # function for package {pwrss}
> opt.n2.pwrss <- function(x, total = 60000, target.power = 0.80, arcsin = TRUE) {
+     calc.power <- pwrss.z.2props(p1 = 0.014, p2 = 0.009, n2 = x,
+                                  kappa = (total / x) - 1, 
+                                  arcsin = arcsin,
+                                  alternative = "greater",
+                                  verbose = FALSE)$power
+   return(calc.power - target.power)
+ }
> 
> # function for package {pwr}
> opt.n2.pwr <- function(x, total = 60000, target.power = 0.80) {
+   calc.power <- pwr.2p2n.test(h = ES.h(p1 = 0.014, p2 = 0.009),
+                                 n1 = total - x,
+                                 n2 = x,
+                                 sig.level = 0.05,
+                                 power = NULL,
+                                 alternative = "greater")$power
+   return(calc.power - target.power)
+ }
> 
> 
> # optimal solutions
> sol1 <- uniroot(opt.n2.pwrss, arcsin = TRUE, interval = c(2,30000))$root
> sol2 <- uniroot(opt.n2.pwrss, arcsin = TRUE, interval = c(30000, 59998))$root
> sol3 <- uniroot(opt.n2.pwr, interval = c(2,30000))$root
> sol4 <- uniroot(opt.n2.pwr, interval = c(30000, 59998))$root
> sol5 <- uniroot(opt.n2.pwrss, arcsin = FALSE, interval = c(2,30000))$root
> sol6 <- uniroot(opt.n2.pwrss, arcsin = FALSE, interval = c(30000, 59998))$root
> 
> # summarize results
> n2 <- ceiling(c(sol1, sol2, sol3, sol4, sol5, sol6))
> n1 <- 60000 - n2
> kappa <- round(n1 / n2, 3)
> data.frame(package = c("pwrss", "pwrss", "pwr", "pwr", "pwrss", "pwrss"),
+            arcsin = c(TRUE, TRUE, TRUE, TRUE, FALSE, FALSE),
+            n2 = n2, n1 = n1, kappa = kappa,
+            power = rep(0.80, 6))
  package arcsin    n2    n1  kappa power
1   pwrss   TRUE  2921 57079 19.541   0.8
2   pwrss   TRUE 57080  2920  0.051   0.8
3     pwr   TRUE  2921 57079 19.541   0.8
4     pwr   TRUE 57080  2920  0.051   0.8
5   pwrss  FALSE  2345 57655 24.586   0.8
6   pwrss  FALSE 56448  3552  0.063   0.8