I'm trying to mutate several columns whose column names have the same prefix and a number as suffix. Each column is mutated based on a value in another column with the corresponding suffix in its name. I can run mutate
using each pair of columns explicitly. However, I'd prefer, if possible, to use a single across
operation, but can't figure out how to make it work.
Below is an example: In the data below, when there's a missing value in a units
column, I want to put a "U"
in the corresponding status
column. So, in this case, units1
has a missing value in row 2, so, in status1
we want row 2 to change from "P"
to "U"
, and likewise for row 4 of the units3
, status3
pair.
library(tidyverse)
# Fake data
set.seed(2)
d = bind_cols(
paste0("status", 1:3) %>%
set_names() %>%
map_df(~sample(c("P","F"), 5, replace=TRUE)),
paste0("units", 1:3) %>%
set_names() %>%
map_df(~sample(c(0:2,NA), 5, prob=c(4,4,4,1), replace=TRUE))
)
d
#> # A tibble: 5 x 6
#> status1 status2 status3 units1 units2 units3
#> <chr> <chr> <chr> <int> <int> <int>
#> 1 P F P 1 1 2
#> 2 P P P NA 2 0
#> 3 F P F 0 1 2
#> 4 F P P 2 0 NA
#> 5 F F P 0 2 0
This can be done by mutating each column separately, which I'd like to avoid:
# Option 1: The hard way
d %>%
mutate(status1 = case_when(is.na(units1) ~ "U",
TRUE ~ status1),
status2 = case_when(is.na(units2) ~ "U",
TRUE ~ status2),
status3 = case_when(is.na(units3) ~ "U",
TRUE ~ status3))
#> # A tibble: 5 x 6
#> status1 status2 status3 units1 units2 units3
#> <chr> <chr> <chr> <int> <int> <int>
#> 1 P F P 1 1 2
#> 2 U P P NA 2 0
#> 3 F P F 0 1 2
#> 4 F P U 2 0 NA
#> 5 F F P 0 2 0
It can also be done by pivoting to long, mutating once, then pivoting back to wide. Even though it takes just one mutate, it is still verbose and has the added complexity of having to figure out a somewhat mind-boggling pivoting operation.
# Option 2: pivot_longer
d %>%
rename_all(~gsub("([1-3])", "_\\1", .)) %>%
pivot_longer(everything(),
names_to=c(".value", "seq"),
names_sep="_") %>%
mutate(status = case_when(is.na(units) ~ "U",
TRUE ~ status)) %>%
pivot_wider(names_from=seq, values_from=c(status, units),
names_sep="") %>%
unnest()
#> # A tibble: 5 x 6
#> status1 status2 status3 units1 units2 units3
#> <chr> <chr> <chr> <int> <int> <int>
#> 1 P F P 1 1 2
#> 2 U P P NA 2 0
#> 3 F P F 0 1 2
#> 4 F P U 2 0 NA
#> 5 F F P 0 2 0
Okay, now for the across
version that I'm trying to figure out:
I use across(matches("^status")
, to operate on each of the three status
columns. The mutate operation needs to successively check the corresponding units
column for each status
column. That is, when status1
is being mutated, it needs to use units1
for the is.na()
condition, and so on.
My thought was to get the numeric suffix for status
using cur_column()
and paste that onto "units" to get a string like "units1"
. But then that needs to be turned into a name for is.na()
to operate on. Below are my two failed attempts. (For these examples, I've created new updated columns to compare with the original columns.)
My question is how to do this correctly. Alternatively, is there an easier/better approach than this?
# Option 3: mutate with across [NOT WORKING]
# First try
d %>%
mutate(across(matches("^status"),
~case_when(is.na(!!as.name(paste0("units", str_extract(cur_column(), "[1-3]$")))) ~ "U",
TRUE ~ .),
.names="{.col}_upd"))
#> Error: `cur_column()` must only be used inside `across()`.
# Second try
d %>%
mutate(across(matches("^status"),
~case_when(is.na(!!rlang::parse_expr(paste0("units", str_extract(cur_column(), "[1-3]$")))) ~ "U",
TRUE ~ .),
.names="{.col}_upd"))
#> Error: `cur_column()` must only be used inside `across()`.
One more thing: In the code above, both of my attempts errored. However, (strangely) this code sometimes runs without error, but gives incorrect output, as shown below.
> d %>%
+ mutate(across(matches("^status"),
+ ~case_when(is.na(!!as.name(paste0("units", str_extract(cur_column(), "[1-3]$")))) ~ "U",
+ TRUE ~ .),
+ .names="{.col}_upd"))
# A tibble: 5 x 9
status1 status2 status3 units1 units2 units3 status1_upd status2_upd status3_upd
<chr> <chr> <chr> <int> <int> <int> <chr> <chr> <chr>
1 P F P 1 1 2 P F P
2 P P P NA 2 0 P P P
3 F P F 0 1 2 F P F
4 F P P 2 0 NA U U U
5 F F P 0 2 0 F F P
>
> d %>%
+ mutate(across(matches("^status"),
+ ~case_when(is.na(!!rlang::parse_expr(paste0("units", str_extract(cur_column(), "[1-3]$")))) ~ "U",
+ TRUE ~ .),
+ .names="{.col}_upd"))
# A tibble: 5 x 9
status1 status2 status3 units1 units2 units3 status1_upd status2_upd status3_upd
<chr> <chr> <chr> <int> <int> <int> <chr> <chr> <chr>
1 P F P 1 1 2 P F P
2 P P P NA 2 0 P P P
3 F P F 0 1 2 F P F
4 F P P 2 0 NA U U U
5 F F P 0 2 0 F F P
Session Info:
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.5 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.6
[8] ggplot2_3.3.3 tidyverse_1.3.0 reprex_1.0.0 testthat_3.0.1 devtools_2.3.2 usethis_2.0.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 lubridate_1.7.9.2 prettyunits_1.1.1 ps_1.5.0 assertthat_0.2.1 rprojroot_2.0.2
[7] utf8_1.1.4 R6_2.5.0 cellranger_1.1.0 backports_1.2.1 httr_1.4.2 pillar_1.4.7
[13] rlang_0.4.10 readxl_1.3.1 rstudioapi_0.13 callr_3.5.1 desc_1.2.0 munsell_0.5.0
[19] broom_0.7.4 compiler_4.0.3 modelr_0.1.8 pkgconfig_2.0.3 pkgbuild_1.2.0 tidyselect_1.1.0
[25] fansi_0.4.2 crayon_1.4.1 dbplyr_2.1.0 withr_2.4.1 grid_4.0.3 jsonlite_1.7.2
[31] gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1 magrittr_2.0.1 scales_1.1.1 cli_2.3.0
[37] stringi_1.5.3 cachem_1.0.1 fs_1.5.0 remotes_2.2.0 xml2_1.3.2 ellipsis_0.3.1
[43] generics_0.1.0 vctrs_0.3.6 tools_4.0.3 glue_1.4.2 hms_1.0.0 processx_3.4.5
[49] pkgload_1.1.0 fastmap_1.1.0 yaml_2.2.1 colorspace_2.0-0 sessioninfo_1.1.1 rvest_0.3.6
[55] memoise_2.0.0 haven_2.3.1