I need to calculate agreement of a factor within factor levels of a data set. For example, imagine a data set where groups of 3 people are asked their favorite color among red, blue, and yellow. I need a way to classify all 8 outcomes (e.g., all 3 people like red, all 3 people like blue, all 3 people like yellow, 2 people like red and one person likes blue, etc.)
Here is a simple example dataframe with 3 groups and their color selections:
group <- c("a","a","a","b","b","b","c","c","c")
color <- c("red","blue","yellow","blue","blue","blue", "yellow","yellow","red")
df <- <- as.data.frame(cbind(group,color))
I recognize that agreement can be boiled down to a binary outcome, whereby if all people agree on one color that could be called agreement and if anyone or more people picked different colors, that outcome would be disagreement. I may end up needing that, but right now I would like a way to move through these factors and report which of the 8 possible outcomes is going on before I simplify them in to agree or disagree.
Interesting problem. I approached this in 3 steps.
Tabulating the number of favorite colors per group, df_summarise.
Creating a lookup table of possible color combinations , combs_tabulated, with a column outcome indicating which of the possible combinations of the 3 colors was used.
Matching the lookup table to df_summarise and assigning the outcome to each group.
library("tidyverse")
group <- c("a","a","a","b","b","b","c","c","c")
color <- c("red","blue","yellow","blue","blue","blue", "yellow","yellow","red")
df <- tibble(group, color) %>%
mutate(group = factor(group),
color = factor(color))
# Create table summarizing number of favorite colors picked for each group.
df_summarise <- df %>%
mutate(n = 1) %>%
pivot_wider(id_cols = group, names_from = color, values_from = n,
values_fn = sum, values_fill = 0)
df_summarise
#> # A tibble: 3 × 4
#> group red blue yellow
#> <fct> <dbl> <dbl> <dbl>
#> 1 a 1 1 1
#> 2 b 0 3 0
#> 3 c 1 0 2
# Create a dummy coded list of all combinations
combs <- as.data.frame(gtools::combinations(3, 3,
v = c("red", "yellow", "blue"),
repeats.allowed = TRUE)) %>%
mutate(outcome = seq(nrow(.)))
combs_tabulated <- combs %>%
pivot_longer(cols = V1:V3) %>%
mutate(n = 1) %>%
pivot_wider(id_cols = outcome, names_from = value, values_from = n,
values_fn = sum, values_fill = 0)
combs_tabulated
#> # A tibble: 10 × 4
#> outcome blue red yellow
#> <int> <dbl> <dbl> <dbl>
#> 1 1 3 0 0
#> 2 2 2 1 0
#> 3 3 2 0 1
#> 4 4 1 2 0
#> 5 5 1 1 1
#> 6 6 1 0 2
#> 7 7 0 3 0
#> 8 8 0 2 1
#> 9 9 0 1 2
#> 10 10 0 0 3
# Match results with table of combinations
group_categorized <- df_summarise %>%
left_join(combs_tabulated)
#> Joining, by = c("red", "blue", "yellow")
group_categorized
#> # A tibble: 3 × 5
#> group red blue yellow outcome
#> <fct> <dbl> <dbl> <dbl> <int>
#> 1 a 1 1 1 5
#> 2 b 0 3 0 1
#> 3 c 1 0 2 9
I really like your response. I think you coded out what I had in mind, but was unable to figure out—so thank you. Prior to your response I went with a much less robust approach, in which I pasted into a new vector a concatenated variable of all the colors for that group. I wrote a short loop with the following code inside: