How can I give certain character or value to each data(row) according to quantile range that its target figure belongs to?

ang · September 23, 2022, 8:30am

Hello, I have a dataset named 'week' that contains a variable named 'DIST'. The structure of it is like this :

ID DIST
1  35463 
2  43264
3  43356
...

I want to allocate certain values for each data(rows) according to the sub range its 'DIST' variable belongs to. So I made a code like below. This was an initial code as well as a sample for question, with only 5 categories(sub ranges).

making_ordinal_category_distance = ifelse(week$DIST < 13057, "5",
                                          ifelse(week$DIST >= 13057 & week$DIST < 34577, "4",
                                                 ifelse(week$DIST >= 34577 & week$DIST < 55039, "3", 
                                                        ifelse(week$DIST >= 55039 & week$DIST < 79043, "2", "1"))))

This can't be used in respect to 100 categories, because would make the script very messy. And now I want to devide sub range(category) with quntile value, and made a list for quntile by 1% that I want to use for devision like below(I can transform this into anything if you tell me requirement for your solution, vector, dataframe, whatever).

quant = seq(from = 0.01, by = 0.01)
list_for_division = quantile(week$DIST, probs = quant)

But the problem is, I have no idea how to apply this. Do I have to use conditional like before? Or is there any package helps this kind of work? If former is right, how I can make code with visibility although I have to consider 100 number of cases?

FactOREO · September 23, 2022, 9:06am

Hello,

you can divide your range of values with the quantile() function into the desired quantiles and then use cut() to assign each distance value to the corresponding group:

Data <- data.frame(
  ID = seq.default(1,1000,1),
  DIST = round(runif(n = 1000, min = 1, max = 100000), digits = 0)
)

# get the quantiles in 1% increase steps
quants <- quantile(Data$DIST, probs = seq.default(0,1,0.01))

# apply to a new column
Data$range <- cut(Data$DIST, breaks = quants)
Data$group <- cut(Data$DIST, breaks = quants, labels = seq.default(1,100,1))

# result
head(Data)
#>   ID  DIST               range group
#> 1  1 58758 (5.87e+04,6.04e+04]    62
#> 2  2 31963 (3.12e+04,3.23e+04]    34
#> 3  3  4567 (3.41e+03,4.62e+03]     5
#> 4  4 26717 (2.65e+04,2.73e+04]    29
#> 5  5 29541 (2.92e+04,3.03e+04]    32
#> 6  6 26573 (2.65e+04,2.73e+04]    29

^{Created on 2022-09-23 by the reprex package (v2.0.1)}

The Data$range column is just to let you know how the intervalls are created by default.

Kind regards

ang · September 26, 2022, 12:31am

It worked, thank you so much!

FactOREO · September 26, 2022, 4:41am

Wonderful. Please accept the answer to indicate a solution was found.

system · October 3, 2022, 4:42am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.