Cleaning data across multiple columns/variables

Hi everyone, have a large dataset with two type of patient identifiers--a patient ID, and a unique identifier for each admission (for example, patient C who has been admitted 3 times will have 3 unique admission identifiers). I also have a column with outcomes Survived and Death, and race. How do I manipulate my data so that multiple admission are not counted, and ultimately, the eventual outcome of Survived vs Death for each patient is counted? Also interested in Race to see if there's any correlation. I have included some sample data and how I eventually want my table to look. Please help! Thank you!!

DF <- data.frame(
  Patient.ID = c("A", "B", "C", "C", "C", "D", "D"),
  Admit.ID = c("1Zz", "1Yy", "5Pp", "3Cc", "9Dd", "4Yy", "4Dd"),
  Race = c("White", "Black", "Asian", "Asian", NA, "Black", "Black"),
  Survived = c(1, 0, 1, 0, 1, 1, 1),
  Died = c(0, 1, 0, 1, 0, 0, 0))
DF_Cleaned <- data.frame(
  Patient.ID = c("A", "B", "C", "D"),
  Race = c("White", "Black", "Asian", "Black"),
  Outcome = c(0, 1, 1, 0))

Created on 2020-11-20 by the reprex package (v0.3.0)

Does this work for you?

library(dplyr)

DF <- data.frame(
    Patient.ID = c("A", "B", "C", "C", "C", "D", "D"),
    Admit.ID = c("1Zz", "1Yy", "5Pp", "3Cc", "9Dd", "4Yy", "4Dd"),
    Race = c("White", "Black", "Asian", "Asian", NA, "Black", "Black"),
    Survived = c(1, 0, 1, 0, 1, 1, 1),
    Died = c(0, 1, 0, 1, 0, 0, 0))

DF %>% 
    group_by(Patient.ID) %>% 
    summarise(Race = first(na.omit(Race)),
              Outcome = if_else(sum(Died) > 0, 1, 0)
              )
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 3
#>   Patient.ID Race  Outcome
#>   <chr>      <chr>   <dbl>
#> 1 A          White       0
#> 2 B          Black       1
#> 3 C          Asian       1
#> 4 D          Black       0

Created on 2020-11-21 by the reprex package (v0.3.0.9001)

slight simplification is possible

DF %>% 
  group_by(Patient.ID) %>% 
  summarise(Race = first(na.omit(Race)),
            Outcome = last(Died)
  )
1 Like

I didn't assume that the records are chronologically ordered but that makes sense.

Thank you both! The data is not chronologically order unfortunately, but using this worked perfectly!

DF %>% 
    group_by(Patient.ID) %>% 
    summarise(Race = first(na.omit(Race)),
              Outcome = if_else(sum(Died) > 0, 1, 0)
              )

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.