I'm trying to summarize data in a programmatic way across rows that I've created with another function and rlang.
This example has 2 traits and 3 materials, but I don't know how many there are in advance and I need to take these predictions and average them.
library(rlang)
library(dplyr)
set.seed(42)
df1 <- tibble(PLTID = 1:10, vigor_M1_pred = runif(10), vigor_M2_pred = runif(10), vigor_M3_pred = runif(10), senes_M1_pred = runif(10), senes_M2_pred = runif(10), senes_M3_pred = runif(10))
traits <- c("vigor", "senes")
mats <- c("M1", "M2", "M3")
# this gives me what I want, but I want to do it programmatically
df1 %>%
mutate(
vigor_avg = rowMeans(select(., starts_with("vigor"))),
senes_avg = rowMeans(select(., starts_with("senes")))
)
funs2 <- setNames(paste('rowMeans(select(., starts_with("', traits, '"), na.rm = TRUE)', sep = ""), paste0(traits, "_avg"))
#can't parse this
df1 %>% mutate(., !!!rlang::parse_exprs(funs2))
#Error in parse(text = x) : <text>:1:55: unexpected ';'
#1: rowMeans(select(., starts_with("vigor"), na.rm = TRUE);
I'd like to be able to generate the names and the function calls using the traits, but I can't seem to get it to happen.
I was pulling a lot from the discussion here:
Jiho has non-quoted arguments though. I'm trying to paste together the function and then parse it.
Pasting strings together is always fraught with difficulties. rlang provides you with a rich toolkit for working with raw expressions, so in your example you can do something like this:
The challenge here is that the variable "trait" and "material" are encoded in the column name, so it's not a tidy data.
I simply tidy-ed the data.
Gathered all the info in col names into a tag column, which harbors all the info.
Split the column names into trait and material columns. If the tag column is more complex, you can always use case_when and str_detect (in package stringr), for example,
Chenxin,
Thanks for your response too. After posting I did try and do this using pivot_longer/wider (gather/spread) and summarize. I originally wanted to use rowMeans because I couldn't do gather/spread due to it being a spatial tibble. https://github.com/r-spatial/sf/issues/1149
However, removing the geometry, isolating important columns and applying this strategy would have been an equally good solution.
The column name coding was on purpose because it was done in a previous map2(mutate) function that generates the input dataframe, but my thoughts were that by standardizing it that allows one to use the select functions to isolate these columns and run rowwise functions on them.