Grouping based on frequency based on a value in one variable

Hi, andresrcs has kindly helped me to group elements in "Model" based on its frequency in the data frame. Now I'm trying to use something a bit more complicated. I would like to apply the rule but not based on overall frequency but frequency in 2020.
Basically, only "bb" and "cc" should stay (prop>=0.4), other Models should be coded as "Other".
The code below takes into account my data frame proportion rather than 2020 proportion.

library(dplyr)
library(forcats)

Sales.data.t <- data.frame(stringsAsFactors = FALSE,
                           Year = c(2019, 2019, 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2019),
                           Model = c("cc", "aa", "gg", "cc", "bb", "bb", NA,
                                         "cc", "cc", "bb"),
                           RType = c("H", "A", "A", "H", "B", "h", "A", "H",
                                     "H", "B")
)

prop <- with(Sales.data.t, table(Model, Year)) %>% 
  prop.table(margin = 2)
prop

Sales.data.t <- Sales.data.t %>% 
  mutate(Main.Models = fct_lump(Model, prop = 0.4)) 

Sales.data.t

Can you help please?

Since you are already manually calculating the proportions, you could take the levels from there and use fct_other(), see this example.

library(dplyr)
library(forcats)

Sales.data.t <- data.frame(stringsAsFactors = FALSE,
                           Year = c(2019, 2019, 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2019),
                           Model = c("cc", "aa", "gg", "cc", "bb", "bb", NA,
                                     "cc", "cc", "bb"),
                           RType = c("H", "A", "A", "H", "B", "h", "A", "H",
                                     "H", "B")
)

keep_levels <- with(Sales.data.t, table(Model, Year)) %>% 
    prop.table(margin = 2) %>% 
    as_tibble() %>%
    filter(Year == 2020, n >= 0.4) %>% 
    pull(Model)


Sales.data.t %>% 
    mutate(Main.Models = fct_other(Model, keep = keep_levels)) 
#>    Year Model RType Main.Models
#> 1  2019    cc     H          cc
#> 2  2019    aa     A       Other
#> 3  2019    gg     A       Other
#> 4  2020    cc     H          cc
#> 5  2020    bb     B          bb
#> 6  2020    bb     h          bb
#> 7  2020  <NA>     A        <NA>
#> 8  2020    cc     H          cc
#> 9  2020    cc     H          cc
#> 10 2019    bb     B          bb

Created on 2019-10-26 by the reprex package (v0.3.0.9000)

1 Like

Very clever, thank you!!!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.