Dear dplyr community
Why dplyr::ntile produces unnecessary levels when the number of levels (n) is "high"?
In this example
my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"),
ITEM_NAME =c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,120))
library(dplyr)
With n=4 it works well
df1 = mutate(my_basket, quantile_rank = ntile(my_basket$Price,4))
but with when n=10
df1 = mutate(my_basket, quantile_rank = ntile(my_basket$Price,10))
|
ITEM_GROUP |
ITEM_NAME |
Price |
quantile_rank |
| 1 |
Fruit |
Apple |
100 |
9 |
| 2 |
Fruit |
Banana |
80 |
6 |
| 3 |
Fruit |
Orange |
80 |
7 |
| 4 |
Fruit |
Mango |
90 |
8 |
| 5 |
Fruit |
Papaya |
65 |
4 |
| 6 |
Vegetable |
Carrot |
70 |
4 |
| 7 |
Vegetable |
Potato |
60 |
3 |
| 8 |
Vegetable |
Brinjal |
70 |
5 |
| 9 |
Vegetable |
Raddish |
25 |
1 |
| 10 |
Dairy |
Milk |
60 |
3 |
| 11 |
Dairy |
Curd |
40 |
2 |
| 12 |
Dairy |
Cheese |
35 |
1 |
| 13 |
Dairy |
Milk |
50 |
2 |
| 14 |
Dairy |
Paneer |
120 |
10 |
The observations 2 and 3 are clasiffy in different quantiles. Is it possible to avoid this?