How to bin this data ?

Hi, I want to create following bins for my variable:
0-10, 11-25, 26-40, 41-55.

How do I do it, please ?

I think cut() will do what you want.

DF <- data.frame(Vairable=sample.int(55,size = 25))
DF
#>    Vairable
#> 1        35
#> 2        47
#> 3         1
#> 4        44
#> 5        55
#> 6         4
#> 7        48
#> 8         7
#> 9        53
#> 10       54
#> 11       23
#> 12       28
#> 13       10
#> 14       33
#> 15       34
#> 16       46
#> 17       25
#> 18        3
#> 19        5
#> 20       45
#> 21       15
#> 22       19
#> 23       30
#> 24       42
#> 25       41
DF$bins <- cut(DF$Vairable,breaks = c(0,10,25,40,55),include.lowest = TRUE)
DF
#>    Vairable    bins
#> 1        35 (25,40]
#> 2        47 (40,55]
#> 3         1  [0,10]
#> 4        44 (40,55]
#> 5        55 (40,55]
#> 6         4  [0,10]
#> 7        48 (40,55]
#> 8         7  [0,10]
#> 9        53 (40,55]
#> 10       54 (40,55]
#> 11       23 (10,25]
#> 12       28 (25,40]
#> 13       10  [0,10]
#> 14       33 (25,40]
#> 15       34 (25,40]
#> 16       46 (40,55]
#> 17       25 (10,25]
#> 18        3  [0,10]
#> 19        5  [0,10]
#> 20       45 (40,55]
#> 21       15 (10,25]
#> 22       19 (10,25]
#> 23       30 (25,40]
#> 24       42 (40,55]
#> 25       41 (40,55]

Created on 2022-06-04 by the reprex package (v2.0.1)

You can also assign labels to the bins. Because your data is integers 11-25 might be clearer than (10, 25].

DF$bins <- cut(DF$Vairable, 
               breaks = c(0,10,25,40,55), 
               labels = c("0-10", "11-25", "26-40", "41-55"),
               include.lowest = TRUE)

Hi and thank you to both of you,

What does it mean here: ( and ] ?
Do I need to specify it or cut() function does it by itself ?

Some of the numbers are place between [ ] and some between ( ].

Much appreciated for explanation, thank you.

If I need a value to be between 11 and 25 what is best to do ? I mean 11 is inluded and 25 is included in that particular bin.

(10, 25] means a bin defined by 10 < x <= 25. The ( means < and the ] means <=. Similarly, [0,10] means 0 <= x <= 10. Since you have integers, (10, 25] acts as the bin 11 - 25; it excludes 10 but accepts 11.

Is it possible to do like:

[10, 25] so 10 >= x <= 25 ?

You cannot define [10,25] in the middle of your range using the cut() function. The problem is that the neighboring ranges would have to be [0, 10) and (25, 40] to avoid matching two ranges at the boundary values. The cut() function will not do that. You can manually define whatever bins you want with the case_when() function.

DF <- data.frame(Variable=sample.int(55,size = 25))

DF <- DF |> mutate(bins = case_when(
  Variable >= 0 & Variable <= 10 ~ "0-10",
  Variable >= 11 & Variable <= 25 ~ "11-25",
  Variable >= 26 & Variable <= 40 ~ "26-40",
  Variable >= 41 & Variable <= 55 ~ "41-55",
  TRUE ~ "Out of range"
))

That does the same thing as in the previous answers but you can tune it however you want.

Thank you, much appreciated.

I just have found a good read:

https://stackoverflow.com/questions/4396290/what-does-this-square-bracket-and-parenthesis-bracket-notation-mean-first1-last

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.