For my project, I am working with the National Cancer Data Base.
I am trying to clean up a variable for the T stage of a cancer (based on the T,N,M staging guidelines) by combining different stages (ex: "1A
, 1A1, 1A2, ) into simple stages ("1",2,3,4).
How can I perform these changes in my uploaded data set ?
Hi Chris, I see your example, but I still haven't figured out what exactly the "modification" you want to do with the data. I guess there're two steps in your ideal process:
extract the rows based on stages named 1A#
simplify the stage names
The following code might be considered (suppose your data stored in df):
Hi Chris, since the replacements for 1 are complex, I suggest you build a lookup dictionary first, which is basically a data.frame, and then use join to match these simplified stages. The look-up dictionary is like:
df %>% left_join(stage_look_up,by = 'TNM_CLIN_T')
# A tibble: 8 x 2
TNM_CLIN_T stage.simplify
<chr> <chr>
1 blank NA
2 cX NA
3 blank NA
4 c4 NA
5 c3 NA
6 c2 NA
7 1A2 1
8 c1 NA
I didn't set those simplified stages for "c1", "c2" ..., so it remains many NA in the result. You may have to finish setting them.