How to improve ntile handle repeated observations

tadger · August 23, 2021, 10:15am

Dear dplyr community
Why dplyr::ntile produces unnecessary levels when the number of levels (n) is "high"?
In this example
my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"),
ITEM_NAME =c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,120))
library(dplyr)
With n=4 it works well
df1 = mutate(my_basket, quantile_rank = ntile(my_basket$Price,4))
but with when n=10
df1 = mutate(my_basket, quantile_rank = ntile(my_basket$Price,10))

	ITEM_GROUP	ITEM_NAME	Price	quantile_rank
1	Fruit	Apple	100	9
2	Fruit	Banana	80	6
3	Fruit	Orange	80	7
4	Fruit	Mango	90	8
5	Fruit	Papaya	65	4
6	Vegetable	Carrot	70	4
7	Vegetable	Potato	60	3
8	Vegetable	Brinjal	70	5
9	Vegetable	Raddish	25	1
10	Dairy	Milk	60	3
11	Dairy	Curd	40	2
12	Dairy	Cheese	35	1
13	Dairy	Milk	50	2
14	Dairy	Paneer	120	10

The observations 2 and 3 are clasiffy in different quantiles. Is it possible to avoid this?

system · September 13, 2021, 10:15am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.