How to group then count column

trangility · December 7, 2020, 8:31pm

Hi There! I have a dataset that is two unqiue identifier--one for every admission, then one for every unique patient (for example, patient "C" was admitted 3 times so has one unique Patient.ID and 3 Admit.ID). I want to count how many males and females there are, but grouped by Patient.ID instead of Admit.ID. Then, I also want to count how many females vs males dead, again grouped by Patient.Id. Here is a sample dataset and what I've done thus far that has not worked!

library(dplyr)

DF <- data.frame(
    Patient.ID = c("A", "B", "C", "C", "C", "D", "D"),
    Admit.ID = c("1Zz", "1Yy", "5Pp", "3Cc", "9Dd", "4Yy", "4Dd"),
    Gender = c("Female", "Male", "Male", "Male", "Male", "Female", "Female"),
    Male = c(0, 1, 1, 1, 1, 0, 0),
    Female = c(1, 0, 0, 0, 0, 1, 1),
    Survived = c(1, 0, 1, 0, 1, 1, 1),
    Died = c(0, 1, 0, 1, 0, 0, 0))

I have tried:
DF%>%
group_by(Patient.ID) %>%
summarise(Female_count=n())

For which I get:
Patient.Id Female_count

1 1 1
2 2 1
3 3 2
4 4 1
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1

... with 62,708 more rows

What I want is basically (from sample table):
2 female patients total, 0 died
2 male patients total, 2 died

Please help!

FJCC · December 7, 2020, 9:42pm

I took the product of Survived so that it will be zero unless all of the values are one.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
DF <- data.frame(
  Patient.ID = c("A", "B", "C", "C", "C", "D", "D"),
  Admit.ID = c("1Zz", "1Yy", "5Pp", "3Cc", "9Dd", "4Yy", "4Dd"),
  Gender = c("Female", "Male", "Male", "Male", "Male", "Female", "Female"),
  Male = c(0, 1, 1, 1, 1, 0, 0),
  Female = c(1, 0, 0, 0, 0, 1, 1),
  Survived = c(1, 0, 1, 0, 1, 1, 1),
  Died = c(0, 1, 0, 1, 0, 0, 0))
tmp <- DF %>% group_by(Patient.ID) %>% 
  summarize(Gender = mean(Female), Surv = prod(Survived))
#> `summarise()` ungrouping output (override with `.groups` argument)
#Gender = 1 means Female
tmp %>% group_by(Gender) %>% summarize(N = n(), Surv = sum(Surv))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 3
#>   Gender     N  Surv
#>    <dbl> <int> <dbl>
#> 1      0     2     0
#> 2      1     2     2

^{Created on 2020-12-07 by the reprex package (v0.3.0)}

trangility · December 8, 2020, 6:31am

Thank you very much for your help!

system · December 15, 2020, 6:31am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.