 # sum() returning a vector

I am doing some calculations using R and I was wondering if there's a way to perform a `sum()` with multiple criteria in a vectorized way.

As a simple example:

``````a=c(-2,-1,1,2,3);
sum(a<0) #Returns 2
sum(a<2) #Returns 3
sum(a<3) #Returns 4
``````

Can this operation be done in a simple way so that it returns a list (2,3,4), defining another vector `b=c(0,2,3)`?

You could be interested in `purrr` and functional programming

``````library(purrr)
a <- c(-2,-1,1,2,3)
map_int(c(0, 2, 3), ~ sum(a < .x))
#>  2 3 4
``````

You can also use base R but `purrr` has a lot of advantage

``````a <- c(-2,-1,1,2,3)
lapply(c(0, 2, 3), function(x) sum(a < x))
#> []
#>  2
#>
#> []
#>  3
#>
#> []
#>  4
sapply(c(0, 2, 3), function(x) sum(a < x))
#>  2 3 4
``````

Created on 2019-09-18 by the reprex package (v0.3.0)

3 Likes

The functional programming solution is definitely elegant and easy to understand. If I were writing a program, it's what I'd go with.

But, just for fun, here's another solution. You can use your vector of break points to divide the input into bins and count the number of values in each bin. If a value is below the first break point, then it's below the second, and so on. So the number below each break is just a cumulative sum.

I also threw in a couple calls to `order`, because `findInterval` demands a sorted vector of break points, but that shouldn't bind our hands.

``````under_breaks <- function(x, breaks) {
break_order <- order(breaks)
intervals <- 1 + findInterval(x, breaks[break_order])
interval_counts <- tabulate(intervals)
cumsum(interval_counts)[order(break_order)]
}

a <- c(-2,-1,1,2,3)
under_breaks(a, c(0, 2, 3))
#  2 3 4
under_breaks(a, c(3, 0, 2))
#  4 2 3
``````

This way avoids a lot of repetitive checking, so it doesn't slow down as much with more breaks.

``````library(microbenchmark)
library(purrr)

func_purrr <- function(x, breaks) {
map_int(breaks, ~ sum(x < .x))
}

func_vapply <- function(x, breaks) {
vapply(breaks, function(b) sum(x < b), integer(1))
}

set.seed(007)
big_a <- rpois(1000, 20)
big_breaks <- rpois(100, 20)
microbenchmark(
interval = under_breaks(big_a, big_breaks),
purrr = func_purrr(big_a, big_breaks),
vapply = func_vapply(big_a, big_breaks)
)
# Unit: microseconds
#      expr   min     lq    mean median     uq    max neval
#  interval  64.0  73.95  99.303  81.05 104.35  539.6   100
#     purrr 301.6 318.10 456.909 343.45 429.75 6590.5   100
#    vapply 236.1 239.95 307.319 248.05 355.65 2274.1   100
``````

Of course, optimization's only worth the sacrifice if the task was actually taking too long.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.