Frequency of main responses based on a filter

Slavek · March 30, 2021, 12:28pm

Hi Masters,
I have this simple df with a list of ModelCat and Year.

source <- data.frame(
  stringsAsFactors = FALSE,
               URN = c("aaa","bbb","ccc","ddd",
                       "eee","fff","ggg","hhh","iii","jjj","kkk","lll",
                       "mmm", "nnn", "ooo"),
          ModelCat = c("Model 1","Model 1","Model 2",
                       "Model 2","Model 2","Model 1","Model 2","Model 3",
                       "Model 4","Model 5","Model 5","Model 5","Model 1","Model 4","Model 1"),
              Year = c(2019,2020,2020,
                       2019,2020,2020,2020,2020,2020,2019,2019,2020,2019,2020,2020)
)

I would like to select only main models.
andresrcs helped me a lot with this clever solution:

library(dplyr)
Sales.models <- source %>%
  add_count(ModelCat, name = "Mod.Freq") %>%
  mutate(Mod.Freq = Mod.Freq/n())

library(dplyr)
library(forcats)
Sales.models <- Sales.models %>% 
  mutate(Main.Models = fct_lump(ModelCat, prop = 0.25))

The challenge I have is setting model frequency only based on a specific year (2020) and using it for the entire data. In other words, ModelCat with frequency in 2020 higher than 25% should be coded as Main.Models, all other ModelCat should be coded as "Other".

Is this doable?

technocrat · March 30, 2021, 8:42pm

suppressPackageStartupMessages({
  library(dplyr)
  library(forcats)
})

source <- data.frame(
  stringsAsFactors = FALSE,
  URN = c("aaa","bbb","ccc","ddd",
          "eee","fff","ggg","hhh","iii","jjj","kkk","lll",
          "mmm", "nnn", "ooo"),
  ModelCat = c("Model 1","Model 1","Model 2",
               "Model 2","Model 2","Model 1","Model 2","Model 3",
               "Model 4","Model 5","Model 5","Model 5","Model 1","Model 4","Model 1"),
  Year = c(2019,2020,2020,
           2019,2020,2020,2020,2020,2020,2019,2019,2020,2019,2020,2020)
)

yr_2020 <- source %>%
  filter(Year == 2020) %>%
  add_count(ModelCat, name = "Mod.Freq") %>%
  mutate(Mod.Freq = Mod.Freq/n()) 

with_freq <- left_join(source,yr_2020)
#> Joining, by = c("URN", "ModelCat", "Year")

with_freq %>%
  mutate(Main.models = case_when(
    is.na(Mod.Freq) ~ "Other",
    Mod.Freq < 0.25 ~ "Other",
    Mod.Freq >= 0.25 ~ ModelCat))
#>    URN ModelCat Year Mod.Freq Main.models
#> 1  aaa  Model 1 2019       NA       Other
#> 2  bbb  Model 1 2020      0.3     Model 1
#> 3  ccc  Model 2 2020      0.3     Model 2
#> 4  ddd  Model 2 2019       NA       Other
#> 5  eee  Model 2 2020      0.3     Model 2
#> 6  fff  Model 1 2020      0.3     Model 1
#> 7  ggg  Model 2 2020      0.3     Model 2
#> 8  hhh  Model 3 2020      0.1       Other
#> 9  iii  Model 4 2020      0.2       Other
#> 10 jjj  Model 5 2019       NA       Other
#> 11 kkk  Model 5 2019       NA       Other
#> 12 lll  Model 5 2020      0.1       Other
#> 13 mmm  Model 1 2019       NA       Other
#> 14 nnn  Model 4 2020      0.2       Other
#> 15 ooo  Model 1 2020      0.3     Model 1

system · April 6, 2021, 8:43pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.