Agreement of factors within another factor level

stevemidway · May 8, 2022, 12:48pm

I need to calculate agreement of a factor within factor levels of a data set. For example, imagine a data set where groups of 3 people are asked their favorite color among red, blue, and yellow. I need a way to classify all 8 outcomes (e.g., all 3 people like red, all 3 people like blue, all 3 people like yellow, 2 people like red and one person likes blue, etc.)

Here is a simple example dataframe with 3 groups and their color selections:

group <- c("a","a","a","b","b","b","c","c","c")
color <- c("red","blue","yellow","blue","blue","blue", "yellow","yellow","red")
df <- <- as.data.frame(cbind(group,color))

I recognize that agreement can be boiled down to a binary outcome, whereby if all people agree on one color that could be called agreement and if anyone or more people picked different colors, that outcome would be disagreement. I may end up needing that, but right now I would like a way to move through these factors and report which of the 8 possible outcomes is going on before I simplify them in to agree or disagree.

jrmuirhead · May 8, 2022, 5:23pm

Hi @stevemidway and welcome!

Interesting problem. I approached this in 3 steps.

Tabulating the number of favorite colors per group, df_summarise.
Creating a lookup table of possible color combinations , combs_tabulated, with a column outcome indicating which of the possible combinations of the 3 colors was used.
Matching the lookup table to df_summarise and assigning the outcome to each group.

library("tidyverse")

group <- c("a","a","a","b","b","b","c","c","c")
color <- c("red","blue","yellow","blue","blue","blue", "yellow","yellow","red")

df <- tibble(group, color) %>%
  mutate(group = factor(group),
    color = factor(color))


# Create table summarizing number of favorite colors picked for each group.
df_summarise <- df %>%
  mutate(n = 1) %>%
  pivot_wider(id_cols = group, names_from = color, values_from = n,
    values_fn = sum, values_fill = 0)

df_summarise
#> # A tibble: 3 × 4
#>   group   red  blue yellow
#>   <fct> <dbl> <dbl>  <dbl>
#> 1 a         1     1      1
#> 2 b         0     3      0
#> 3 c         1     0      2

# Create a dummy coded list of all combinations
combs <- as.data.frame(gtools::combinations(3, 3,
  v = c("red", "yellow", "blue"), 
  repeats.allowed = TRUE)) %>%
  mutate(outcome = seq(nrow(.)))

combs_tabulated <- combs %>%
  pivot_longer(cols = V1:V3) %>%
  mutate(n = 1) %>%
  pivot_wider(id_cols = outcome, names_from = value, values_from = n,
    values_fn = sum, values_fill = 0)

combs_tabulated
#> # A tibble: 10 × 4
#>    outcome  blue   red yellow
#>      <int> <dbl> <dbl>  <dbl>
#>  1       1     3     0      0
#>  2       2     2     1      0
#>  3       3     2     0      1
#>  4       4     1     2      0
#>  5       5     1     1      1
#>  6       6     1     0      2
#>  7       7     0     3      0
#>  8       8     0     2      1
#>  9       9     0     1      2
#> 10      10     0     0      3

# Match results with table of combinations
group_categorized  <- df_summarise %>%
  left_join(combs_tabulated)
#> Joining, by = c("red", "blue", "yellow")

group_categorized
#> # A tibble: 3 × 5
#>   group   red  blue yellow outcome
#>   <fct> <dbl> <dbl>  <dbl>   <int>
#> 1 a         1     1      1       5
#> 2 b         0     3      0       1
#> 3 c         1     0      2       9

^{Created on 2022-05-08 by the reprex package (v2.0.1)}

stevemidway · May 16, 2022, 1:21am

Thanks for the feedback, @jrmuirhead!

I really like your response. I think you coded out what I had in mind, but was unable to figure out—so thank you. Prior to your response I went with a much less robust approach, in which I pasted into a new vector a concatenated variable of all the colors for that group. I wrote a short loop with the following code inside:

temp <- filter(df, ID == group[i])
agree[i] <- paste(temp$color[1],temp$color[2], temp$color[3],sep = "_")

I then had a vector (agree) of all the group outcomes, which I could tabulate and figure out agreement with.

system · June 6, 2022, 1:22am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.