The functional programming solution is definitely elegant and easy to understand. If I were writing a program, it's what I'd go with.
But, just for fun, here's another solution. You can use your vector of break points to divide the input into bins and count the number of values in each bin. If a value is below the first break point, then it's below the second, and so on. So the number below each break is just a cumulative sum.
I also threw in a couple calls to order
, because findInterval
demands a sorted vector of break points, but that shouldn't bind our hands.
under_breaks <- function(x, breaks) {
break_order <- order(breaks)
intervals <- 1 + findInterval(x, breaks[break_order])
interval_counts <- tabulate(intervals)
cumsum(interval_counts)[order(break_order)]
}
a <- c(-2,-1,1,2,3)
under_breaks(a, c(0, 2, 3))
# [1] 2 3 4
under_breaks(a, c(3, 0, 2))
# [1] 4 2 3
This way avoids a lot of repetitive checking, so it doesn't slow down as much with more breaks.
library(microbenchmark)
library(purrr)
func_purrr <- function(x, breaks) {
map_int(breaks, ~ sum(x < .x))
}
func_vapply <- function(x, breaks) {
vapply(breaks, function(b) sum(x < b), integer(1))
}
set.seed(007)
big_a <- rpois(1000, 20)
big_breaks <- rpois(100, 20)
microbenchmark(
interval = under_breaks(big_a, big_breaks),
purrr = func_purrr(big_a, big_breaks),
vapply = func_vapply(big_a, big_breaks)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# interval 64.0 73.95 99.303 81.05 104.35 539.6 100
# purrr 301.6 318.10 456.909 343.45 429.75 6590.5 100
# vapply 236.1 239.95 307.319 248.05 355.65 2274.1 100
Of course, optimization's only worth the sacrifice if the task was actually taking too long.