# How can I give certain character or value to each data(row) according to quantile range that its target figure belongs to?

Hello, I have a dataset named 'week' that contains a variable named 'DIST'. The structure of it is like this :

``````ID DIST
1  35463
2  43264
3  43356
...
``````

I want to allocate certain values for each data(rows) according to the sub range its 'DIST' variable belongs to. So I made a code like below. This was an initial code as well as a sample for question, with only 5 categories(sub ranges).

``````making_ordinal_category_distance = ifelse(week\$DIST < 13057, "5",
ifelse(week\$DIST >= 13057 & week\$DIST < 34577, "4",
ifelse(week\$DIST >= 34577 & week\$DIST < 55039, "3",
ifelse(week\$DIST >= 55039 & week\$DIST < 79043, "2", "1"))))
``````

This can't be used in respect to 100 categories, because would make the script very messy. And now I want to devide sub range(category) with quntile value, and made a list for quntile by 1% that I want to use for devision like below(I can transform this into anything if you tell me requirement for your solution, vector, dataframe, whatever).

``````quant = seq(from = 0.01, by = 0.01)
list_for_division = quantile(week\$DIST, probs = quant)
``````

But the problem is, I have no idea how to apply this. Do I have to use conditional like before? Or is there any package helps this kind of work? If former is right, how I can make code with visibility although I have to consider 100 number of cases?

Hello,

you can divide your range of values with the `quantile()` function into the desired quantiles and then use `cut()` to assign each distance value to the corresponding group:

``````Data <- data.frame(
ID = seq.default(1,1000,1),
DIST = round(runif(n = 1000, min = 1, max = 100000), digits = 0)
)

# get the quantiles in 1% increase steps
quants <- quantile(Data\$DIST, probs = seq.default(0,1,0.01))

# apply to a new column
Data\$range <- cut(Data\$DIST, breaks = quants)
Data\$group <- cut(Data\$DIST, breaks = quants, labels = seq.default(1,100,1))

# result
#>   ID  DIST               range group
#> 1  1 58758 (5.87e+04,6.04e+04]    62
#> 2  2 31963 (3.12e+04,3.23e+04]    34
#> 3  3  4567 (3.41e+03,4.62e+03]     5
#> 4  4 26717 (2.65e+04,2.73e+04]    29
#> 5  5 29541 (2.92e+04,3.03e+04]    32
#> 6  6 26573 (2.65e+04,2.73e+04]    29
``````

Created on 2022-09-23 by the reprex package (v2.0.1)

The `Data\$range` column is just to let you know how the intervalls are created by default.

Kind regards

1 Like

It worked, thank you so much!

Wonderful. Please accept the answer to indicate a solution was found. This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.