Create bins the tidy way?

Suppose I have a tibble with some numerical variable that goes from a to b. For example:

my_df <- tibble(t = 835:1017) 

I want to add a new bin column like follows:

library(dplyr)

my_df <- tibble(t = 835:1017) 

max_t <- max(my_df$t)

min_t <- min(my_df$t)

end <-min_t + 4*ceiling(length(min_t:max_t)/4)


out <- my_df %>% 
  mutate(bin = cut(min_t:max_t, breaks = c(seq(min_t, end, by = 4)), right = FALSE))

out
#> # A tibble: 183 x 2
#>        t bin      
#>    <int> <fct>    
#>  1   835 [835,839)
#>  2   836 [835,839)
#>  3   837 [835,839)
#>  4   838 [835,839)
#>  5   839 [839,843)
#>  6   840 [839,843)
#>  7   841 [839,843)
#>  8   842 [839,843)
#>  9   843 [843,847)
#> 10   844 [843,847)
#> # … with 173 more rows

Created on 2020-04-27 by the reprex package (v0.3.0)

Is there a tidier way of doing this?

You can calculate the minimum and maximum values directly in the cut function. There's also no need to wrap seq in the c function. For example, if you know you want to start with the minimum, have bins of width 4, and have the bins closed on the left, then you can do the following:

my_df %>% 
  mutate(bin = cut(t, seq(min(t), max(t) + 4, 4), right = FALSE))
1 Like

These are some alternative functions, somewhat strangely within ggplot2:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.