Best way to replace values conditionally

jonas2 · April 14, 2022, 2:06pm

Hello guys,
I wonder what the best way is to replace certain values in a data frame with some other values. Lets say I have a variable that can have a few distinct values (e.g., strings) such as "Treatment 1", "Treatment 2", "Treatment 3". Now I want to efficiently replace them with values "1", "2", "3".

Generally, I do an "ifelse" case for each of those, but this feels not ideal.

arthur.t · April 14, 2022, 2:09pm

You can do an ifelse if it's a simple case. Otherwise you should create a translation data frame with columns for old value and new value, and join this to your data frame. The benefit of this is to store the logic as data that is easy to communicate and share, and change the logic in data instead of code as needed. After the join, you may need a new_value <- if_else(is.na(new_value), old_value, new_value) to handle the rows that don't have a replacement.

jonas2 · April 14, 2022, 2:11pm

So, I cannot really prevent the ugly structure of having many rows to replace the values one-by-one. I thought about something like a match case in other programming languages, but then I'd need to pass the data frame into a dedicated function I guess...

arthur.t · April 14, 2022, 2:44pm

I think the join strategy I described will prevent "one-by-one replacement in code".

For you original example, if you had a data frame

translate <- tibble(old = c("Treatment1", "Treatment2", "Treatment3"), new = c("1", "2", "3"))

You can join this to your data frame to populate all the replacement values in one line of code.

jonas2 · April 14, 2022, 3:35pm

Wait sorry, I think there is a misunderstanding. Its not column names but values within a column. So a column could have the name "Treatment" and the values are as described above. Now I want to replace the values in that column.

FJCC · April 14, 2022, 3:45pm

Here are examples of two methods. The first on is the one suggested by @arthur.t . It has the advantage of clearly showing the intended replacements and preserving the original column. In the second method, I take advantage of the fact that the replacement amounts to extracting the numeric characters from the original values. I realize that might not be actually true in your real data.

library(dplyr)
library(tibble)
DF <- tibble(Treatment = c("Treatment 1", "Treatment 2", "Treatment 1", "Treatment 3"),
                 Value = c(32,14,35,24))
DF
#> # A tibble: 4 × 2
#>   Treatment   Value
#>   <chr>       <dbl>
#> 1 Treatment 1    32
#> 2 Treatment 2    14
#> 3 Treatment 1    35
#> 4 Treatment 3    24
translate <- tibble(old = c("Treatment 1", "Treatment 2", "Treatment 3"), new = c("1", "2", "3"))

#arthur.t method
DF <- left_join(DF, translate, by = c(Treatment = "old"))
DF
#> # A tibble: 4 × 3
#>   Treatment   Value new  
#>   <chr>       <dbl> <chr>
#> 1 Treatment 1    32 1    
#> 2 Treatment 2    14 2    
#> 3 Treatment 1    35 1    
#> 4 Treatment 3    24 3

#method with stringr
library(stringr)
DF <- tibble(Treatment = c("Treatment 1", "Treatment 2", "Treatment 1", "Treatment 3"),
             Value = c(32,14,35,24))
DF <- DF %>% mutate(Treatment = str_extract(Treatment, "\\d+"))
DF
#> # A tibble: 4 × 2
#>   Treatment Value
#>   <chr>     <dbl>
#> 1 1            32
#> 2 2            14
#> 3 1            35
#> 4 3            24

^{Created on 2022-04-14 by the reprex package (v0.2.1)}

rene_at_coco · April 15, 2022, 7:42pm

Now I'm curious about use cases for the arthur.t translational data frame method. I often find myself in situations where I have to update how I recode my data.

system · April 22, 2022, 7:43pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.