I have a dataframe dfcolor.
subjectid is the unique id of each individual and grpmonth1:grpmonth5 are columns showing where each individual are assigned each month (total 5 month) across three group (RED, GREEN, BLUE)
dfcolor <- data.frame(
subjectid = c(1,2,3,4,5,6,7,8,9,10,11),
grpmonth1 = c("RED", "BLUE", "GREEN", "RED", "GREEN", "RED", "GREEN", "GREEN", "BLUE", "GREEN", "BLUE"),
grpmonth2 = c("RED", "RED", "GREEN", "RED", "RED", "RED", "GREEN", "GREEN", "RED", "GREEN", "BLUE"),
grpmonth3 = c("GREEN", "BLUE", "GREEN", "RED", "GREEN", "RED", "GREEN", "BLUE", "RED", "GREEN", "BLUE"),
grpmonth4 = c("BLUE", "BLUE", "GREEN", "RED", "GREEN", "RED", "RED", "GREEN", "RED", "GREEN", "BLUE"),
grpmonth5 = c("RED", "BLUE", "RED", "RED", "RED", "RED", "GREEN", "GREEN", "RED", "BLUE", "BLUE"))
dfcolor
#> subjectid grpmonth1 grpmonth2 grpmonth3 grpmonth4 grpmonth5
#> 1 1 RED RED GREEN BLUE RED
#> 2 2 BLUE RED BLUE BLUE BLUE
#> 3 3 GREEN GREEN GREEN GREEN RED
#> 4 4 RED RED RED RED RED
#> 5 5 GREEN RED GREEN GREEN RED
#> 6 6 RED RED RED RED RED
#> 7 7 GREEN GREEN GREEN RED GREEN
#> 8 8 GREEN GREEN BLUE GREEN GREEN
#> 9 9 BLUE RED RED RED RED
#> 10 10 GREEN GREEN GREEN GREEN BLUE
#> 11 11 BLUE BLUE BLUE BLUE BLUE
Created on 2021-09-01 by the reprex package (v2.0.1)
Now I need to create 5 set of columns showing how many individual were there in each month for the group that particular individual is in that particular month. Below I can see how many individuals were there in each group/each month manually.
totalmnth1 <- dfcolor %>% group_by(grpmonth1) %>% summarize(n())
totalmnth2 <- dfcolor %>% group_by(grpmonth2) %>% summarize(n())
totalmnth3 <- dfcolor %>% group_by(grpmonth3) %>% summarize(n())
totalmnth4 <- dfcolor %>% group_by(grpmonth4) %>% summarize(n())
totalmnth5 <- dfcolor %>% group_by(grpmonth5) %>% summarize(n())
But how do I link this with the main df i.e. dfcolor easily and build the five columns that I need that links each individual with number of group members in the group they are in each month?
Lets say for "subjectid" 1, he is in group "red" in month 1 and so his value for the new column ttlgrppl_m1 will be total people who were in group "red" during month 1.
A mock example I manually made just to illustrate how I want my final data to look like (NOTE: the numbers I kept there are random),
dfcolornew
#> subjectid grpmonth1 grpmonth2 grpmonth3 grpmonth4 grpmonth5 ttlgrppl_m1
#> 1 1 RED RED GREEN BLUE RED 3
#> 2 2 BLUE RED BLUE BLUE BLUE 2
#> 3 3 GREEN GREEN GREEN GREEN RED 1
#> 4 4 RED RED RED RED RED 3
#> 5 5 GREEN RED GREEN GREEN RED 2
#> 6 6 RED RED RED RED RED 1
#> 7 7 GREEN GREEN GREEN RED GREEN 3
#> 8 8 GREEN GREEN BLUE GREEN GREEN 2
#> 9 9 BLUE RED RED RED RED 1
#> 10 10 GREEN GREEN GREEN GREEN BLUE 3
#> 11 11 BLUE BLUE BLUE BLUE BLUE 2
#> ttlgrppl_m2 ttlgrppl_m3 ttlgrppl_m4 ttlgrppl_m5
#> 1 3 3 3 3
#> 2 3 1 2 2
#> 3 1 1 1 1
#> 4 3 2 3 2
#> 5 2 2 2 2
#> 6 1 3 2 1
#> 7 3 3 3 1
#> 8 2 2 1 2
#> 9 1 1 1 1
#> 10 3 3 3 3
#> 11 2 2 2 2
Created on 2021-09-01 by the reprex package (v2.0.1)
P.S I need to work with huge number of group values so I need to use a range. E.g my group can be from group 1:5000. If you could also include that in your suggestions, that will be awesome !
Thank you R community !