Dear dplyr community
Why dplyr::ntile produces unnecessary levels when the number of levels (n) is "high"?
In this example
my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"),
ITEM_NAME =c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,120))
library(dplyr)
With n=4 it works well
df1 = mutate(my_basket, quantile_rank = ntile(my_basket$Price,4))
but with when n=10
df1 = mutate(my_basket, quantile_rank = ntile(my_basket$Price,10))
ITEM_GROUP | ITEM_NAME | Price | quantile_rank | |
---|---|---|---|---|
1 | Fruit | Apple | 100 | 9 |
2 | Fruit | Banana | 80 | 6 |
3 | Fruit | Orange | 80 | 7 |
4 | Fruit | Mango | 90 | 8 |
5 | Fruit | Papaya | 65 | 4 |
6 | Vegetable | Carrot | 70 | 4 |
7 | Vegetable | Potato | 60 | 3 |
8 | Vegetable | Brinjal | 70 | 5 |
9 | Vegetable | Raddish | 25 | 1 |
10 | Dairy | Milk | 60 | 3 |
11 | Dairy | Curd | 40 | 2 |
12 | Dairy | Cheese | 35 | 1 |
13 | Dairy | Milk | 50 | 2 |
14 | Dairy | Paneer | 120 | 10 |
The observations 2 and 3 are clasiffy in different quantiles. Is it possible to avoid this?