summarize number of values greater than by group and threshold for each group

Hi all,

I have got two data.frames:

x <- c(4,8,10)
group <- c(1,2,3)
threshold <- data.frame(group,x)
threshold
# group x
# 1     4
# 2     8
# 3     10

x <- c(3,4,5,8,8,9,9,9,10)
group <- c(1,1,1,2,2,2,3,3,3)
values <- data.frame(group, x) 
values
# group x
# 1     3
# 1     4
# 1     5
# 2     8
# 2     8
# 2     9
# 3     9
# 3     9
# 3     10

I'm looking for a way/formula/command that provides a dataframe with the sums of instances of values greater than or equal to my thresholds for the respective groups. The result should look like this:

# group    x
# 1     2
# 2     3
# 3     1

I can do a group-sorted list of values greater than or equal to a single shared threshold (i.e 3):

aggregate(values[,-1], by = list(values$group), 
                function(x) sum(x >= 5, na.rm = TRUE))

but not for different thresholds for each respective group.

Thanks.

I would join the two data frames and create a summary from that.

x <- c(4,8,10)
group <- c(1,2,3)
threshold <- data.frame(group,x)
threshold
#>   group  x
#> 1     1  4
#> 2     2  8
#> 3     3 10

x <- c(3,4,5,8,8,9,9,9,10)
group <- c(1,1,1,2,2,2,3,3,3)
values <- data.frame(group, x) 
library(dplyr)
values %>% inner_join(threshold, by = "group", suffix = c(".values", ".thresh")) %>% 
  group_by(group) %>% 
  summarize(x = sum(x.values >= x.thresh))
#> # A tibble: 3 x 2
#>   group     x
#>   <dbl> <int>
#> 1     1     2
#> 2     2     3
#> 3     3     1

Created on 2020-06-30 by the reprex package (v0.3.0)

2 Likes

Thank you for your quick and helpful reply. :smiley:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.