Hi there,
I’m trying to construct a new variable, but only if certain conditions are TRUE, and this new variable
should pull the value from another variable at a specific position... it’s quite tricky to explain, but
I have prepared some toy data and a working script in this repo:
Here is the script:
library(tidyverse)
df <- readr::read_csv("https://raw.githubusercontent.com/b-rodrigues/mutate_when_predicate/main/example_df.csv")
ids <- df %>%
group_by(person, col_a, col_c) %>%
summarise(needs_correction = n_distinct(to_fill)) %>%
filter(needs_correction > 1) %>%
mutate(needs_correction = TRUE)
full_join(df, ids) %>%
group_by(person, col_a, col_c) %>%
mutate(new_loc = ifelse(all(needs_correction),
pull(
filter(cur_data(),
col_b == "W"),
to_fill),
NA_character_)) %>%
ungroup() %>%
mutate(new_loc2 = coalesce(new_loc, to_fill))
The use case is as follows:
for each person in the data, I need to fill the column called to_fill
. However, I want to do so only:
1 - where to_fill
has more than one unique value by col_a
and col_c
(basically, for a given individual grouped by col_a
and col_c
, to_fill
is not constant) BUT
2 - only if it is empty where col_b
is "S"
3 - and do so by groups formed by col_a
and col_c
after running the code, new_loc2 is the required solution; values of to_fill
where col_b
equal "S" are replaced by the values where col_b
equals "W", or otherwise ignored.
Am I overthinking this? Is there a simpler solution?