Hi all, this is my first post on the Posit Community - I'm really grateful for the help! Some quick background on my project: I work in sociology with IPUMS data, where each row is a person and columns specify various attributes of that person like age, sex, the size of the household they live in, etc. I have a number of functions summarizing and slicing and splicing the data - running regressions, summarizing variables by groups, etc.
I'm working on a function, called bootstrap_replicates()
, that takes another function and some data as inputs. I want it to output the result of the input function on the data as well as a list of several bootstrapped results using replicate weights. (I'll eventually use these bootstrapped outputs to calculate standard errors on my estimates).
I'm trying to make my bootstrap_replicates()
function as flexible as possible, using the ... to include additional arguments in the input function f
. But I'm struggling to get the result I want when I pass this ... into a map
function within bootstrap_replicates()
. The numerical result is not what I'm expecting. If someone could help explain why this is happening and a good workaround I'd be really grateful.
BTW, I know I could just avoid the problem by explicitly specifying the hhsize
argument within bootstrap_replicates()
, but that would pose a problem if I try to use bootstrap_replicates()
on other functions, like the count_by_sex()
function in my example below, which doesn't have a hhsize
input argument.
Replicable example below. Thank you in advance!
library(dplyr)
library(purrr)
# Input data
input <- tibble(
per_id = c(1, 2, 3, 4, 5),
sex = c(1, 0, 1, 1, 0),
hhsize = c(2, 3, 2, 1, 1),
wt = c(10, 12, 15, 30, 20),
repwt1 = c(11, 13, 16, 28, 22),
repwt2 = c(8, 8, 16, 25, 22),
repwt3 = c(2, 4, 10, 14, 13),
repwt4 = c(18, 17, 11, 25, 15)
)
# Two example functions that I want to bootstrap in `bootstrap_replicates()`.
# Note that they must they must have an explicit argument for weight (`wt`) to work
# within `bootstrap_replicates()`
hhsize_by_sex <- function(
data,
wt, # string name of weight column in `data`
hhsize # string name of hhsize column in `data`
) {
result <- data |>
group_by(sex) |>
summarize(
weighted_mean = sum(.data[[hhsize]] * .data[[wt]], na.rm = TRUE)/sum(.data[[wt]], na.rm = TRUE),
.groups = "drop"
)
return(result)
}
count_by_sex <- function(
data,
wt # string name of weight column in `data`
) {
result <- data |>
group_by(sex) |>
summarize(
weighted_count = sum(.data[[wt]]),
.groups = "drop"
)
return(result)
}
# Test out the functions
hhsize_by_sex(input, wt = "wt", hhsize = "hhsize")
count_by_sex(input, wt = "wt")
# Calculates results of a target function `f()` and also calculates results of the
# target function subbing each of the specified `repwt_col` arguments for the
# `wt` argument within `f()`
bootstrap_replicates <- function(
data,
f, # function producing new columns for standard errors. Must have an argument
# that is called "wt"
wt_col = "wt", # string name of weight column in `data`
repwt_cols = paste0("repwt", 1:4), # Vector of strings of replicate weight columns
# in `data`
... # Any additional arguments needed for function f
) {
main_estimate <- f(data, wt = wt_col, ...)
replicate_estimates <- map(repwt_cols, ~ f(data, wt = .x, ...))
# Return results
list(
main_estimate = main_estimate,
replicate_estimates = replicate_estimates
)
}
# Initialize the names of the replicate weight columns
repwt_vector <- paste0("repwt", 1:4)
# This is the result of the function. The bootstrapped replicates are not what
# I expected
bootstrap_replicates(
data = input,
f = hhsize_by_sex,
wt_col = "wt",
repwt_cols = repwt_vector,
hhsize = "hhsize"
)
# This is what the bootstrapped replicates ~should~ be.
map(repwt_vector, ~ hhsize_by_sex(input, wt = .x, hhsize = "hhsize"))