Updating data for multiple columns

weetbix_warrior · November 22, 2023, 4:22am

I'm looking for an effective way to update data for multiple columns based on many different conditions.

When the condition is true I want to update multiple columns of a datframe. I'm prettty sure dplyr's case_when function can only return a single column.

See the code below, this is what I'm currently doing, is there a better alternative? .
To achieve the same with dplyr I would need 2x case_when statements, for the below example it doesn't seem like much but my real-world problem is ~20 conditions, and 4 columns. It's always the same columns that need to be updated but it can be different column or combinations of columns with logic in the triggering condition.

mtcars[mtcars$cyl == 8, c("wt","qsec")] <- list(1,2)
mtcars[mtcars$disp > 160, c("wt","qsec")] <- list(3,4)

technocrat · November 22, 2023, 8:47am

Here's an approach illustrated by 7 conditions. Note that the final result may be affected by the order in which the conditions are applied.

# receiving data frame
score <- mtcars[,c("wt","cyl")]

# rows of source data frame meeting conditions
himpg   <- which(mtcars$mpg > 20)
bigcyl  <- which(mtcars$cyl == 8)
bigdisp <- which(mtcars$disp > 300)
lowhp   <- which(mtcars$hp < 100)
lowrat  <- which(mtcars$drat < 3)
midwt   <- which(mtcars$wt > 3 & mtcars$wt < 4)
fastq   <- which(mtcars$qsec > 20)

# iterable for conditions
l <- list(himpg,bigcyl,bigdisp,lowhp,lowrat,midwt,fastq)

# replacements
r <- list(c(1,2),c(3,4),c(5,6),c(7,8),c(9,10),c(11,12),c(13,14))

# Apply replacements to score data frame
for (i in seq_along(l)) score[l[[i]],] <- r[[i]]

score
#>                        wt cyl
#> Mazda RX4            1.00   1
#> Mazda RX4 Wag        2.00   2
#> Datsun 710           7.00   8
#> Hornet 4 Drive      11.00  11
#> Hornet Sportabout   12.00  12
#> Valiant             13.00  14
#> Duster 360          12.00  12
#> Merc 240D           11.00  11
#> Merc 230            14.00  13
#> Merc 280            11.00  11
#> Merc 280C           12.00  12
#> Merc 450SE           3.00   3
#> Merc 450SL          11.00  11
#> Merc 450SLC         12.00  12
#> Cadillac Fleetwood  10.00   9
#> Lincoln Continental  6.00   5
#> Chrysler Imperial    5.00   6
#> Fiat 128             8.00   7
#> Honda Civic          7.00   8
#> Toyota Corolla       8.00   7
#> Toyota Corona       13.00  14
#> Dodge Challenger    11.00  11
#> AMC Javelin         12.00  12
#> Camaro Z28          11.00  11
#> Pontiac Firebird    12.00  12
#> Fiat X1-9            8.00   7
#> Porsche 914-2        7.00   8
#> Lotus Europa         1.00   1
#> Ford Pantera L      11.00  11
#> Ferrari Dino         2.77   6
#> Maserati Bora       12.00  12
#> Volvo 142E           2.00   2

^{Created on 2023-11-22 with reprex v2.0.2}

nirgrahamuk · November 22, 2023, 11:04am

Here is a function that lets you do a one liner mutate that achieves your same result

library(tidyverse)
mtcars_original <- mtcars

mtcars_mod_1 <- mtcars_original

mtcars_mod_1[mtcars_mod_1$cyl == 8, c("wt","qsec")] <- list(1,2)


do_one <- function(data,cond,vars,vals){
  cond <- substitute(cond)
  result_env <- new.env()
  result_env$data <- data
  
   walk2(vars,vals,\(var,val){
    result_env$data <-  mutate(result_env$data,across(all_of(var),
                       \(x)case_when(eval(cond) & cur_column()==var~val,
                                     .default = x)))
  })
   result_env$data
}


mtcars_mod_2 <- do_one(mtcars_original,cyl==8,c("wt","qsec"),list(1,2))

identical(mtcars_mod_1,mtcars_mod_2)

weetbix_warrior · November 22, 2023, 10:05pm

Thanks for your clean example of a method to achieve this!

I think for my example (in the real world) I need to keep the new values as close to each condition as possible so it's easy to tell for say it might be client X what we are adjusting certain values to be , rather than checking which object the condition is assigned to, it's position in the first list and then having to correctly read the same position in the second list for the values.

technocrat · November 22, 2023, 10:18pm

Yeah, I thought that might be a problem. My suggestion then would be to order the priority of conditions and test in a for() loop that breaks once the condition is met.

weetbix_warrior · November 23, 2023, 3:46am

Thanks for this, it definitely works, I have no idea how it works LOL.

I will study this for some time, but will probably never fully understand.

system · January 4, 2024, 3:47am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.