I am trying to erase data that does not appear often enough for analysis anyway. For that I want to use fct_lump_min (see R: Lump together factor levels into "other") . You basically tell the function how often a value has to appear at least, otherwise, its value gets overwritten to in this case "Too Rare", which you then can search for and delete. Unfortunately, R has this weird tendency to just erase everything in case it doesn't find anything to label as "Too Rare". In the example below, everything works as intended, as long as there is something to label (with n = 3 the bananas are omitted, but the apples stay). If you change the value to n=2 however, or if you concatenate the dataframe a couple of times with itself (also leading to having at least 3 bananas in the dataframe), everything is erased. Any idea on how to fix this?
#n=3, works as intended
Fruit<-c("Banana", "Apple", "Banana", "Apple", "Apple")
Origin<-c("New Guinea", "China","Germany", "USA", "Germany")
Quality<-c("Good", "Bad", "Good", "Very bad", "Decent")
Value<-c(50,75,80,60,30) #cents
Price<-c(1,2,1,3,1) #euros
Fruits<-data.frame(Fruit, Origin, Quality, Value, Price)
#m <- 5
#Fruits<-do.call("rbind", replicate(m, Fruits, simplify = FALSE))
Fruits<-Fruits[-c(which(fct_lump_min(
Fruits$`Fruit`,
3, w = NULL, other_level = "Too Rare") == "Too Rare")),]
#n=2, erases everything
Fruit<-c("Banana", "Apple", "Banana", "Apple", "Apple")
Origin<-c("New Guinea", "China","Germany", "USA", "Germany")
Quality<-c("Good", "Bad", "Good", "Very bad", "Decent")
Value<-c(50,75,80,60,30) #cents
Price<-c(1,2,1,3,1) #euros
Fruits<-data.frame(Fruit, Origin, Quality, Value, Price)
#m <- 5
#Fruits<-do.call("rbind", replicate(m, Fruits, simplify = FALSE))
Fruits<-Fruits[-c(which(fct_lump_min(
Fruits$`Fruit`,
2, w = NULL, other_level = "Too Rare") == "Too Rare")),]
#Concatenation with n=3, erases everything
Fruit<-c("Banana", "Apple", "Banana", "Apple", "Apple")
Origin<-c("New Guinea", "China","Germany", "USA", "Germany")
Quality<-c("Good", "Bad", "Good", "Very bad", "Decent")
Value<-c(50,75,80,60,30) #cents
Price<-c(1,2,1,3,1) #euros
Fruits<-data.frame(Fruit, Origin, Quality, Value, Price)
m <- 5
Fruits<-do.call("rbind", replicate(m, Fruits, simplify = FALSE))
Fruits<-Fruits[-c(which(fct_lump_min(
Fruits$`Fruit`,
3, w = NULL, other_level = "Too Rare") == "Too Rare")),]