parse vector of functions in mutate

I'm trying to summarize data in a programmatic way across rows that I've created with another function and rlang.
This example has 2 traits and 3 materials, but I don't know how many there are in advance and I need to take these predictions and average them.

library(rlang)
library(dplyr)

set.seed(42)

df1 <- tibble(PLTID = 1:10, vigor_M1_pred = runif(10), vigor_M2_pred = runif(10), vigor_M3_pred = runif(10), senes_M1_pred = runif(10), senes_M2_pred = runif(10), senes_M3_pred = runif(10))


traits <- c("vigor", "senes")
mats <- c("M1", "M2", "M3")

# this gives me what I want, but I want to do it programmatically
df1 %>% 
	mutate(
		vigor_avg = rowMeans(select(., starts_with("vigor"))),
		senes_avg = rowMeans(select(., starts_with("senes")))
		)


funs2 <-  setNames(paste('rowMeans(select(., starts_with("', traits, '"), na.rm = TRUE)', sep = ""), paste0(traits, "_avg"))

#can't parse this
df1 %>% mutate(., !!!rlang::parse_exprs(funs2))
#Error in parse(text = x) : <text>:1:55: unexpected ';'
#1: rowMeans(select(., starts_with("vigor"), na.rm = TRUE);

I'd like to be able to generate the names and the function calls using the traits, but I can't seem to get it to happen.
I was pulling a lot from the discussion here:

Jiho has non-quoted arguments though. I'm trying to paste together the function and then parse it.

Pasting strings together is always fraught with difficulties. rlang provides you with a rich toolkit for working with raw expressions, so in your example you can do something like this:

library(rlang)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(42)

df1 <- tibble(PLTID = 1:10, 
              vigor_M1_pred = runif(10), 
              vigor_M2_pred = runif(10), 
              vigor_M3_pred = runif(10), 
              senes_M1_pred = runif(10), 
              senes_M2_pred = runif(10), 
              senes_M3_pred = runif(10))


traits <- c("vigor", "senes")
mats <- c("M1", "M2", "M3")


# funs2 <-  setNames(paste('rowMeans(select(., starts_with("', traits, '"), na.rm = TRUE)', sep = ""), paste0(traits, "_avg"))
funs2 <- purrr::map(traits, function(trait){
  rlang::expr(rowMeans(select(., starts_with(!!trait)), na.rm = TRUE))
}) %>%
  purrr::set_names(paste0(traits, "_avg"))

#can't parse this
df1 %>% mutate(., !!!funs2)
#> # A tibble: 10 x 9
#>    PLTID vigor_M1_pred vigor_M2_pred vigor_M3_pred senes_M1_pred
#>    <int>         <dbl>         <dbl>         <dbl>         <dbl>
#>  1     1         0.915         0.458        0.904        0.738  
#>  2     2         0.937         0.719        0.139        0.811  
#>  3     3         0.286         0.935        0.989        0.388  
#>  4     4         0.830         0.255        0.947        0.685  
#>  5     5         0.642         0.462        0.0824       0.00395
#>  6     6         0.519         0.940        0.514        0.833  
#>  7     7         0.737         0.978        0.390        0.00733
#>  8     8         0.135         0.117        0.906        0.208  
#>  9     9         0.657         0.475        0.447        0.907  
#> 10    10         0.705         0.560        0.836        0.612  
#> # … with 4 more variables: senes_M2_pred <dbl>, senes_M3_pred <dbl>,
#> #   vigor_avg <dbl>, senes_avg <dbl>

Created on 2019-09-17 by the reprex package (v0.3.0)

3 Likes

So my issue was going the string route and not trying to make the function call an expression.

Very nice solution. Works well. Cheers.

1 Like
library(tidyr)
library(dplyr)

set.seed(42)

df1 <- tibble(PLTID = 1:10, 
              vigor_M1_pred = runif(10), 
              vigor_M2_pred = runif(10), 
              vigor_M3_pred = runif(10), 
              senes_M1_pred = runif(10), 
              senes_M2_pred = runif(10), 
              senes_M3_pred = runif(10))


traits <- c("vigor", "senes")
mats <- c("M1", "M2", "M3")

df2 <- df1 %>% 
  gather("tag", "value", 2:ncol(df1)) %>% 
  separate(tag, c("trait", "material", "pred"), sep = "_") %>% 
  group_by(trait, material) %>% 
  summarise(mean = mean(value)) %>% 
  ungroup()

df2

#Easy to read solution!
1 Like

The challenge here is that the variable "trait" and "material" are encoded in the column name, so it's not a tidy data.

I simply tidy-ed the data.

  1. Gathered all the info in col names into a tag column, which harbors all the info.
  2. Split the column names into trait and material columns. If the tag column is more complex, you can always use case_when and str_detect (in package stringr), for example,
...%>%
    mutate(trait = case_when(
        str_detect(tag, "vig") ~ "vigor",
        str_detect(tag, "sene") ~ "senes"
))
  1. dplyr has elegant and powerful group - split - joint operation, in conjunction with summarise, you will never need to use rowMean or rowSum again!
1 Like

Chenxin,
Thanks for your response too. After posting I did try and do this using pivot_longer/wider (gather/spread) and summarize. I originally wanted to use rowMeans because I couldn't do gather/spread due to it being a spatial tibble. https://github.com/r-spatial/sf/issues/1149
However, removing the geometry, isolating important columns and applying this strategy would have been an equally good solution.

The column name coding was on purpose because it was done in a previous map2(mutate) function that generates the input dataframe, but my thoughts were that by standardizing it that allows one to use the select functions to isolate these columns and run rowwise functions on them.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.