Glad you worked out a solution! Here are a few alternative ideas that might be a bit more streamlined. When recoding variables like this, I personally strongly favor maximizing readability and future maintainability — I don't want it to be a mystery to future-me (or anybody else) where and how the data coding decisions are made.
Set up test data frame
library(tidyverse)
mrgb_trus <- data.frame(
MRGB_gleason = c("3+4", "4", "3+4", "4+4", "3+3",NA, "3+4", "3+3", NA, "4+3",
"3+3", "3+4", "3+4", NA, "3", "3+4", NA, NA, NA, NA, "4+3", "3+4", "3+3",
"4+3", "4+4", "4+5", "3+3", "4+3", "4+3", NA, NA, "3+3", "4+4", "3+4", "4+5",
"3+3", "5+4", NA, NA, "3+4", "4+3", NA, "3+3", "4+3", "3+4", "3+4", "3+4", NA,
"4+4", "4+3", "3+4", "3+4"),
stringsAsFactors = FALSE)
mrgb_trus_case_when <- mrgb_trus %>%
mutate(
MRGGG = case_when(
is.na(MRGB_gleason) ~ "0",
MRGB_gleason == "3" ~ "1",
MRGB_gleason == "4" ~ "1",
MRGB_gleason == "3+3" ~ "1",
MRGB_gleason == "3+4" ~ "2",
MRGB_gleason == "4+3" ~ "3",
MRGB_gleason == "4+4" ~ "4",
MRGB_gleason == "4+5" ~ "5",
MRGB_gleason == "5+4" ~ "5",
MRGB_gleason == "5+5" ~ "5"
)
)
To maximize maintainability, you could store your lookup table as a CSV (reading it in as needed). That way nobody has to go digging around inside the code to add translations, and the CSV itself can be stored along with other project metadata.
mrgb_lookup <- tribble(
~ gleas_score, ~ gleas_grd_grp,
NA, "0",
"3", "1",
"4", "1",
"3+3", "1",
"3+4", "2",
"4+3", "3",
"4+4", "4",
"4+5", "5",
"5+4", "5",
"5+5", "5"
)
mrgb_trus_inner_join <- mrgb_trus %>%
inner_join(mrgb_lookup, by = c("MRGB_gleason" = "gleas_score")) %>%
rename("MRGGG" = "gleas_grd_grp") # new col will bring along name from lookup table
Both of these methods produce the same results as your solution:
mrgb_trus_3step <- mrgb_trus %>%
mutate(
MRGGG = str_replace_all(
MRGB_gleason,
c("3\\+3" = "1", "3\\+4" = "2",
"4\\+3" = "3", "4\\+4" = "4",
"4\\+5" = "5", "5\\+4" = "5",
"5\\+5" = "5")
),
MRGGG = replace(MRGGG, is.na(MRGGG), 0),
MRGGG = replace(MRGGG, MRGB_gleason == "3" | MRGB_gleason == "4", "1")
)
identical(
mrgb_trus_3step$MRGGG,
mrgb_trus_case_when$MRGGG
)
#> [1] TRUE
identical(
mrgb_trus_3step$MRGGG,
mrgb_trus_inner_join$MRGGG
)
#> [1] TRUE
Notes:
- As seen above, you can put all your mutate steps in a single call to
mutate()
— it applies the changes sequentially, so later steps in a single call get the updated values from earlier steps.
- Eventually, you probably want to convert your Gleason Grade Group values into an ordered factor
- You might be interested in the
questionr
package. It has some really neat interactive RStudio add-ins that help you build variable recoding code — see the vignette here: https://juba.github.io/questionr/articles/recoding_addins.html