Hi Masters,
I have this simple df with a list of ModelCat and Year.
source <- data.frame(
stringsAsFactors = FALSE,
URN = c("aaa","bbb","ccc","ddd",
"eee","fff","ggg","hhh","iii","jjj","kkk","lll",
"mmm", "nnn", "ooo"),
ModelCat = c("Model 1","Model 1","Model 2",
"Model 2","Model 2","Model 1","Model 2","Model 3",
"Model 4","Model 5","Model 5","Model 5","Model 1","Model 4","Model 1"),
Year = c(2019,2020,2020,
2019,2020,2020,2020,2020,2020,2019,2019,2020,2019,2020,2020)
)
I would like to select only main models.
andresrcs helped me a lot with this clever solution:
library(dplyr)
Sales.models <- source %>%
add_count(ModelCat, name = "Mod.Freq") %>%
mutate(Mod.Freq = Mod.Freq/n())
library(dplyr)
library(forcats)
Sales.models <- Sales.models %>%
mutate(Main.Models = fct_lump(ModelCat, prop = 0.25))
The challenge I have is setting model frequency only based on a specific year (2020) and using it for the entire data. In other words, ModelCat with frequency in 2020 higher than 25% should be coded as Main.Models, all other ModelCat should be coded as "Other".
Is this doable?