Pass ... arguments into map() function

lorae · November 7, 2024, 3:55pm

Hi all, this is my first post on the Posit Community - I'm really grateful for the help! Some quick background on my project: I work in sociology with IPUMS data, where each row is a person and columns specify various attributes of that person like age, sex, the size of the household they live in, etc. I have a number of functions summarizing and slicing and splicing the data - running regressions, summarizing variables by groups, etc.

I'm working on a function, called bootstrap_replicates(), that takes another function and some data as inputs. I want it to output the result of the input function on the data as well as a list of several bootstrapped results using replicate weights. (I'll eventually use these bootstrapped outputs to calculate standard errors on my estimates).

I'm trying to make my bootstrap_replicates() function as flexible as possible, using the ... to include additional arguments in the input function f. But I'm struggling to get the result I want when I pass this ... into a map function within bootstrap_replicates(). The numerical result is not what I'm expecting. If someone could help explain why this is happening and a good workaround I'd be really grateful.

BTW, I know I could just avoid the problem by explicitly specifying the hhsize argument within bootstrap_replicates(), but that would pose a problem if I try to use bootstrap_replicates() on other functions, like the count_by_sex() function in my example below, which doesn't have a hhsize input argument.

Replicable example below. Thank you in advance!

library(dplyr)
library(purrr)

# Input data
input <- tibble(
  per_id = c(1, 2, 3, 4, 5),
  sex = c(1, 0, 1, 1, 0),
  hhsize = c(2, 3, 2, 1, 1),
  wt = c(10, 12, 15, 30, 20),
  repwt1 = c(11, 13, 16, 28, 22),
  repwt2 = c(8, 8, 16, 25, 22),
  repwt3 = c(2, 4, 10, 14, 13),
  repwt4 = c(18, 17, 11, 25, 15)
)

# Two example functions that I want to bootstrap in `bootstrap_replicates()`.
# Note that they must they must have an explicit argument for weight (`wt`) to work
# within `bootstrap_replicates()`
hhsize_by_sex <- function(
    data,
    wt, # string name of weight column in `data`
    hhsize # string name of hhsize column in `data`
    ) {
  result <- data |>
    group_by(sex) |>
    summarize(
      weighted_mean = sum(.data[[hhsize]] * .data[[wt]], na.rm = TRUE)/sum(.data[[wt]], na.rm = TRUE),
      .groups = "drop"
    )
  
  return(result)
}

count_by_sex <- function(
    data,
    wt # string name of weight column in `data`
) {
  result <- data |>
    group_by(sex) |>
    summarize(
      weighted_count = sum(.data[[wt]]),
      .groups = "drop"
    )
  
  return(result)
}

# Test out the functions
hhsize_by_sex(input, wt = "wt", hhsize = "hhsize")
count_by_sex(input, wt = "wt")


# Calculates results of a target function `f()` and also calculates results of the 
# target function subbing each of the specified `repwt_col` arguments for the 
# `wt` argument within `f()`
bootstrap_replicates <- function(
    data, 
    f, # function producing new columns for standard errors. Must have an argument
    # that is called "wt"
    wt_col = "wt", # string name of weight column in `data`
    repwt_cols = paste0("repwt", 1:4), # Vector of strings of replicate weight columns
    # in `data`
    ... # Any additional arguments needed for function f
    ) {
  main_estimate <- f(data, wt = wt_col, ...)
  replicate_estimates <- map(repwt_cols, ~ f(data, wt = .x, ...))
  
  # Return results
  list(
    main_estimate = main_estimate,
    replicate_estimates = replicate_estimates
  )
}


# Initialize the names of the replicate weight columns
repwt_vector <- paste0("repwt", 1:4)

# This is the result of the function. The bootstrapped replicates are not what
# I expected
bootstrap_replicates(
  data = input, 
  f = hhsize_by_sex, 
  wt_col = "wt", 
  repwt_cols = repwt_vector, 
  hhsize = "hhsize"
  )

# This is what the bootstrapped replicates ~should~ be. 
map(repwt_vector, ~ hhsize_by_sex(input, wt = .x, hhsize = "hhsize"))

arangaca · November 7, 2024, 6:00pm

Use an anonymous function instead.

function(.x) f(data, wt = .x, ...)

Or

\(.x) f(data, wt = .x, ...) # requires R ≥ 4.1.0

bootstrap_replicates <- function(
    data, 
    f, # function producing new columns for standard errors. Must have an argument
    # that is called "wt"
    wt_col = "wt", # string name of weight column in `data`
    repwt_cols = paste0("repwt", 1:4), # Vector of strings of replicate weight columns
    # in `data`
    ... # Any additional arguments needed for function f
    ) {
  main_estimate <- f(data, wt = wt_col, ...)
  replicate_estimates <- map(repwt_cols, function(.x) f(data, wt = .x, ...))
  
  # Return results
  list(
    main_estimate = main_estimate,
    replicate_estimates = replicate_estimates
  )
}

In your original code, the formula is converted into a function using rlang::as_function() that returns a function with its own arguments. I don't think it's possible to use the ellipsis implicitly to pass your own arguments in that case.

That said, I'd not recommend to use the ellipsis to pass arguments to a callback function. That makes the code harder to reason about. It usually better to pass a callback function with the required arguments directly.

lorae · November 7, 2024, 8:33pm

Thank you so much! And your point is taken. I'll see how I can refactor my code to be more explicit about the extra arguments.

system · November 14, 2024, 8:33pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.