Specific Aggregate Type for Each Column

michaelgloven · September 16, 2023, 2:34pm

I have a dataframe where I'd like to apply a different aggregate types (mean, max, min, etc.) to selected columns. How can i modify the code below to allow me to specify different aggregations for fields x3 and x4?

x1 <- c("T", "T", "T", "T", "T", "T", "F", "F", "F", "N")
x2 <- c("A", "B", "A", "B", "A", "B", "A", "B", "C", "D")
x3 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x4 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x5 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
d <- data.frame(x1, x2, x3, x4, x5)

# Define grouping columns and aggregated fields
groups <- c("x1", "x2")
fields <- c("x3", "x4")

# Define aggregation functions
types <- list(list(mean = mean,
              max = max,
              min = min),
              list(length = length)
)

# Use summarise() to aggregate data
d %>%
    group_by(across(all_of(groups))) %>%
    summarise(across(all_of(fields), types))

nirgrahamuk · September 18, 2023, 11:30am

doing it by hand would be

d |>
  group_by(across(all_of(groups))) |>
  summarise(
    across(fields[1],
           .fns=types[[1]]),
    across(
      fields[2],
      .fns=types[[2]]))

doing some metaprogramming with rlang might be

library(rlang)
mycalls <- map_chr(seq_along(fields),
     \(x){
       paste0(
         "across(.col=fields[",x,"],.fns=types[[",x,"]])")}) |>
  rlang::parse_exprs()

d |>
  group_by(across(all_of(groups))) |>
  summarise(
  !!!(mycalls))

michaelgloven · September 18, 2023, 5:33pm

thank you, however, the issue I run into is the source of my data for aggregation "types" actually looks like this (not sure how to reprex this):

[1] "list"
List of 2
: chr "mean" : chr "min"

and I am unable to get it into a form to work in your second function. I've tried noquote(unlist(types) which gives me:

[1] mean min
'noquote' chr [1:2] "mean" "min"

Any suggestions on how to pass the function arguments in sequence as just mean and min?

nirgrahamuk · September 18, 2023, 9:21pm

Not sure what you sre showing me, that you dont have function per se, but the text name of a function a la "mean" ?

michaelgloven · September 18, 2023, 9:38pm

ok, I have a shiny application and when I try to pass in type values from my ui (mean, min, max, etc.) for each field into your function I get this error:

Warning: Error in summarise: ℹ In argument: `across(.col = fields[1], .fns = types[[1]])`.
Caused by error in `across()`:
! `.fns` must be a function, a formula, or a list of functions/formulas.

I believe the error results from the format of the values rendered through the ui which when printed to the console look like this:

[1] "list"
List of 2
 $ : chr "mean"
 $ : chr "mean"

My use case is to allow selected columns in a data frame to have an aggregation, field1 could have min, field2 could have max, etc.

The reprex i provided (and below) attempted to provide the format of the types above and your code works in this case. So, types works when it is:

types <- list(list(mean = mean,
              max = max,
              min = min),
              list(length = length))

print(types)

[[1]]
[[1]]$mean
function (x, ...) 
UseMethod("mean")
<bytecode: 0x00000221e3e3e2b0>
<environment: namespace:base>

[[1]]$max
function (..., na.rm = FALSE)  .Primitive("max")

[[1]]$min
function (..., na.rm = FALSE)  .Primitive("min")

[[2]]
[[2]]$length
function (x)  .Primitive("length")

I hope this makes sense, and suggestions are appreciated

nirgrahamuk · September 19, 2023, 8:24am

ok, so types now is not a list of named functions, but a list of the names of functions.
so I would translate them to their functions and then the original code would work.
i.e.

x1 <- c("T", "T", "T", "T", "T", "T", "F", "F", "F", "N")
x2 <- c("A", "B", "A", "B", "A", "B", "A", "B", "C", "D")
x3 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x4 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x5 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
d <- data.frame(x1, x2, x3, x4, x5)

# Define grouping columns and aggregated fields
groups <- c("x1", "x2")
fields <- c("x3", "x4")

# Define aggregation functions as character strings
types <- list(list(mean = "mean",
                   max = "max",
                   min = "min"),
              list(length = "length")
)

ftypes <- lapply(types,
                 \(x)mget(unlist(x),inherits = TRUE))

library(rlang)
mycalls <- map_chr(seq_along(fields),
     \(x){
       paste0(
         "across(.col=fields[",x,"],.fns=ftypes[[",x,"]])")}) |>
  rlang::parse_exprs()

d |>
  group_by(across(all_of(groups))) |>
  summarise(
  !!!(mycalls))

note the use of ftypes as an intermediary.

michaelgloven · September 19, 2023, 1:46pm

awesome, this works! thanks

system · September 26, 2023, 1:47pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.