Replacing missing values and identifying mutually exclusiveness

Dear all,

I am working with the grouped data in R, and I am a new user. In the following data example, I would like to fill the missing values in "sex" variable, and keep as it is if there was no corresponding data (ie for id=6). In the "diagnosis" variable , some had only one diagnosis and some had multiple diagnosis. So, I would like to group the variable "diagnosis" into "wanted". Pls kindly help me how I can solve this .
The example data is;
d.f <- tribble (
~id, ~sex, ~diagnosis,
1, "M", "A",
1, NA, "B",
1, NA, "C",
2, NA, "A",
2, "F", NA,
2, NA, "A",
3, NA, NA,
3, "M", "A",
3, "M", "B",
4, "F", "C",
5, "F", "B",
6, NA, "A",
7, "M", NA
)
The desired data frame is ;
wanted <- tribble (
~id, ~sex, ~diagnosis,~wanted,
1, "M", "A", "ABC group",
1, "M", "B", "ABC group",
1, "M", "C", "ABC group",
2, "F", "A", "Only A",
2, "F", NA, "Only A",
2, "F", "A", "Only A",
3, "M", NA, "AB group",
3, "M", "A", "AB group",
3, "M", "B", "AB group",
4, "F", "C", "Only C",
5, "F", "B", "Only B",
6, NA, "A", "Only A",
7, "M", NA, "Missing"
)
Regards,
Moon Lu

I think this works for what you need:

library(tidyverse)

d.f <- tribble (
   ~id, ~sex, ~diagnosis,
   1, "M", "A",
   1, NA, "B",
   1, NA, "C",
   2, NA, "A",
   2, "F", NA,
   2, NA, "A",
   3, NA, NA,
   3, "M", "A",
   3, "M", "B",
   4, "F", "C",
   5, "F", "B",
   6, NA, "A",
   7, "M", NA
)

d.f %>%
   group_by(id) %>%
   mutate(
      ngroups=n_distinct(diagnosis, na.rm=TRUE),
      gps=str_flatten(setdiff(unique(diagnosis), NA)),
      wanted=case_when(
         ngroups==1~glue::glue("Only {gps}"),
         ngroups>1~glue::glue("{gps} Group"),
         TRUE~"Missing"
      )
   ) %>%
   ungroup()
#> # A tibble: 13 x 6
#>       id sex   diagnosis ngroups gps   wanted   
#>    <dbl> <chr> <chr>       <int> <chr> <glue>   
#>  1     1 M     A               3 "ABC" ABC Group
#>  2     1 <NA>  B               3 "ABC" ABC Group
#>  3     1 <NA>  C               3 "ABC" ABC Group
#>  4     2 <NA>  A               1 "A"   Only A   
#>  5     2 F     <NA>            1 "A"   Only A   
#>  6     2 <NA>  A               1 "A"   Only A   
#>  7     3 <NA>  <NA>            2 "AB"  AB Group 
#>  8     3 M     A               2 "AB"  AB Group 
#>  9     3 M     B               2 "AB"  AB Group 
#> 10     4 F     C               1 "C"   Only C   
#> 11     5 F     B               1 "B"   Only B   
#> 12     6 <NA>  A               1 "A"   Only A   
#> 13     7 M     <NA>            0 ""    Missing

Created on 2021-07-08 by the reprex package (v2.0.0)

Thanks for the very helpful scripts. For replacing missing values in "sex", I have added the extra line

d.f %>%
   group_by(id) %>%
   mutate(
      ngroups=n_distinct(diagnosis, na.rm=TRUE),
      gps=str_flatten(setdiff(unique(diagnosis), NA)),
      wanted=case_when(
         ngroups==1~glue::glue("Only {gps}"),
         ngroups>1~glue::glue("{gps} Group"),
         TRUE~"Missing"
      )
   ) %>%fill(sex,.direction="updown") %>%  ungroup()

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.