Hello guys,
I wonder what the best way is to replace certain values in a data frame with some other values. Lets say I have a variable that can have a few distinct values (e.g., strings) such as "Treatment 1", "Treatment 2", "Treatment 3". Now I want to efficiently replace them with values "1", "2", "3".
Generally, I do an "ifelse" case for each of those, but this feels not ideal.
You can do an ifelse if it's a simple case. Otherwise you should create a translation data frame with columns for old value and new value, and join this to your data frame. The benefit of this is to store the logic as data that is easy to communicate and share, and change the logic in data instead of code as needed. After the join, you may need a new_value <- if_else(is.na(new_value), old_value, new_value) to handle the rows that don't have a replacement.
So, I cannot really prevent the ugly structure of having many rows to replace the values one-by-one. I thought about something like a match case in other programming languages, but then I'd need to pass the data frame into a dedicated function I guess...
Wait sorry, I think there is a misunderstanding. Its not column names but values within a column. So a column could have the name "Treatment" and the values are as described above. Now I want to replace the values in that column.
Here are examples of two methods. The first on is the one suggested by @arthur.t . It has the advantage of clearly showing the intended replacements and preserving the original column. In the second method, I take advantage of the fact that the replacement amounts to extracting the numeric characters from the original values. I realize that might not be actually true in your real data.
Now I'm curious about use cases for the arthur.t translational data frame method. I often find myself in situations where I have to update how I recode my data.