How can I give certain character or value to each data(row) according to quantile range that its target figure belongs to?

Hello, I have a dataset named 'week' that contains a variable named 'DIST'. The structure of it is like this :

ID DIST
1  35463 
2  43264
3  43356
...

I want to allocate certain values for each data(rows) according to the sub range its 'DIST' variable belongs to. So I made a code like below. This was an initial code as well as a sample for question, with only 5 categories(sub ranges).

making_ordinal_category_distance = ifelse(week$DIST < 13057, "5",
                                          ifelse(week$DIST >= 13057 & week$DIST < 34577, "4",
                                                 ifelse(week$DIST >= 34577 & week$DIST < 55039, "3", 
                                                        ifelse(week$DIST >= 55039 & week$DIST < 79043, "2", "1"))))

This can't be used in respect to 100 categories, because would make the script very messy. And now I want to devide sub range(category) with quntile value, and made a list for quntile by 1% that I want to use for devision like below(I can transform this into anything if you tell me requirement for your solution, vector, dataframe, whatever).

quant = seq(from = 0.01, by = 0.01)
list_for_division = quantile(week$DIST, probs = quant)

But the problem is, I have no idea how to apply this. Do I have to use conditional like before? Or is there any package helps this kind of work? If former is right, how I can make code with visibility although I have to consider 100 number of cases?

Hello,

you can divide your range of values with the quantile() function into the desired quantiles and then use cut() to assign each distance value to the corresponding group:

Data <- data.frame(
  ID = seq.default(1,1000,1),
  DIST = round(runif(n = 1000, min = 1, max = 100000), digits = 0)
)

# get the quantiles in 1% increase steps
quants <- quantile(Data$DIST, probs = seq.default(0,1,0.01))

# apply to a new column
Data$range <- cut(Data$DIST, breaks = quants)
Data$group <- cut(Data$DIST, breaks = quants, labels = seq.default(1,100,1))

# result
head(Data)
#>   ID  DIST               range group
#> 1  1 58758 (5.87e+04,6.04e+04]    62
#> 2  2 31963 (3.12e+04,3.23e+04]    34
#> 3  3  4567 (3.41e+03,4.62e+03]     5
#> 4  4 26717 (2.65e+04,2.73e+04]    29
#> 5  5 29541 (2.92e+04,3.03e+04]    32
#> 6  6 26573 (2.65e+04,2.73e+04]    29

Created on 2022-09-23 by the reprex package (v2.0.1)

The Data$range column is just to let you know how the intervalls are created by default.

Kind regards

1 Like

It worked, thank you so much!

Wonderful. Please accept the answer to indicate a solution was found. :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.