How do you find maximum value for factor variable

Hi, I am trying to recode race/ethnicity variable, so it has 3 categories Majority, Minority and the rest NA for a given country. I started using Ifelse and mutate, but was having trouble coming up with a conditional statement to specify the most numerous racial group per country. I appreciate any help.

df <- df %>%
mutate(ethnicity = ifelse(ethnicity[n == max(n)],"Majority", "Minority"))%>%

Please show a little of your data. You can post the output of

dput(head(df, 20))

Please put a line with three back ticks just before and after the posted output, like this
output of dput()

The output of dput(head(df,20)) would have been more convenient than the output of glimpse. I constructed a toy data set for an example that I hope will give you what you need. I first calculate the ethnic group that appears most commonly for each country. Then for each row of the original data, I append a column showing which ethnic group is most common for that country. If the original ethnic group matches the most common one, I label that row Dominant. Otherwise the label in Minority.
Note that if two ethnic groups have exactly the same number of members in a country, both groups will be appended in the left_join and you will have to decide how to deal with that. Check how many rows your data frame has originally and after running the code to see if this happened. You will get extra rows if two ethnic groups were equal.
I included several steps where I print out the intermediate data frames to make it clearer how the code works. Those steps are not necessary.

#> Warning: package 'tibble' was built under R version 4.1.2
DF <- data.frame(Country = c("A", "A", "A", "A",
                             "B", "B", "B", "B"),
                 EthnicGroup = c("Q", "W", "Q", "Q",
                                 "Q", "E", "E", "E"))
#>   Country EthnicGroup
#> 1       A           Q
#> 2       A           W
#> 3       A           Q
#> 4       A           Q
#> 5       B           Q
#> 6       B           E
#> 7       B           E
#> 8       B           E

MaxEthnic <- DF |> group_by(Country, EthnicGroup) |> 
  summarize(N = n()) |> 
  slice_max(order_by = N)
#> `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
#> # A tibble: 2 x 3
#> # Groups:   Country [2]
#>   Country EthnicGroup     N
#>   <chr>   <chr>       <int>
#> 1 A       Q               3
#> 2 B       E               3

DF <- left_join(DF, MaxEthnic, by = "Country")
#>   Country EthnicGroup.x EthnicGroup.y N
#> 1       A             Q             Q 3
#> 2       A             W             Q 3
#> 3       A             Q             Q 3
#> 4       A             Q             Q 3
#> 5       B             Q             E 3
#> 6       B             E             E 3
#> 7       B             E             E 3
#> 8       B             E             E 3
DF <- DF |> mutate(Dom_Minor = ifelse(EthnicGroup.x == EthnicGroup.y, "Dominant", "Minority"))
#>   Country EthnicGroup.x EthnicGroup.y N Dom_Minor
#> 1       A             Q             Q 3  Dominant
#> 2       A             W             Q 3  Minority
#> 3       A             Q             Q 3  Dominant
#> 4       A             Q             Q 3  Dominant
#> 5       B             Q             E 3  Minority
#> 6       B             E             E 3  Dominant
#> 7       B             E             E 3  Dominant
#> 8       B             E             E 3  Dominant

Created on 2022-03-06 by the reprex package (v2.0.1)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.