Hello everyone,
I am currently involved in a tiny research project and I am facing a mutate-problem with dplyr relatively early in the process. I am trying to calculate the percentage of of a sub-group on a main group. The data looks like this:
Dataset 1:
area_id status count
1 01001 1 93928
2 01001 2 99368
3 01001 3 74463
4 01002 1 215774
5 01002 2 218672
6 01002 3 192941
7 01003 1 181151
8 01003 2 192095
9 01003 3 180232
Dataset 2
are_id count
1 01001 267759
2 01002 627387
3 01003 553478
4 01004 243065
5 01051 331176
So as you can see, table 1 splits the data of table 2 into various sub-groups, depending on a variable called "status". Therefore, table 1 has three times as much as rows as table 2. My goal is to compute a new column in table 1 that contains the percentage of each subgroup on the total count per area unit that is only included in table 2. I want it to look like this:
area_id status count percentage
1 01001 1 93928 0.3507967986
2 01001 2 99368 0.3711098413
3 01001 3 74463 etc.
I tried this, but it is not working:
table1 %>%
mutate(percentage=ifelse(table1$area_id == table2$area_id, table1$count/table2$count, "FALSE")
Can somebody help me on this one?
Thanks a lot.