sort values into groups by ranges

sivan · April 11, 2020, 10:19am

Hi
I have 2 data frames: the 1st includes numeric values (x column), and the 2nd set of ranges with unique IDs
I would like to add the 1st df a column for group ID, with default NA (which means - no group). For rows which df$x fall within a range - I want to set the group ID

df1<-data.frame(x=c(68, 39, 1, 34, 87, 43, 14, 82, 59, 21))
df.ranges<-data.frame(group=c("A", "B", "C"),
                     from=c(40, 20, 80),
                     to=c(70, 30, 90))

df1$group<-NA

I can do it with for loop

library(dplyr)
#expected results
for (i in 1:length(df.ranges$group)){
 df1$group[which(between(df1$x, df.ranges$from[i], df.ranges$to[i]))]<-levels(factor(df.ranges$group[i]))
}
df1

but I wondered if there is a vectorization method to do it (without loop) ?

mattwarkentin · April 11, 2020, 4:04pm

Hi @sivan,

This is a good use-case for using the {dplyr} function case_when(...)

df1 %>% 
  mutate(group = case_when(
    between(x, 40, 70) ~ "A",
    between(x, 20, 30) ~ "B",
    between(x, 80, 90) ~ "C",
    TRUE ~ NA_character_
  ))

sivan · April 11, 2020, 4:42pm

Thanks @mattattwarkentin
Actually, the data frame with the ranges is big, and generated automatically by other function. So, a solution based on the 2nd data frame ranges is not suiting to me.
I simplify the scenario for the example

I also tried to use inrange() {data.table} - which mark as TRUE any value, which fall into one of the ranges. The problem with inrange() that I don't know how to assign to the relevant group...

mattwarkentin · April 11, 2020, 5:18pm

Ah I see. This code works but it's a bit hacky, I think. I wrote a helper function which returns the group label. Then you can apply this to your df1. Not sure this is any better than a for loop, expect maybe just a little easier to reason about.

library(purrr)
which_group <- function(x, df) {
  group_index <- pmap_lgl(df, ~between(x, ..2, ..3)) %>% which()
  if(!is_empty(group_index)) {
  df.ranges %>% 
    slice(group_index) %>% 
    pull(group)
  } else {
    NA
  }
}

df1 %>% 
  mutate(group = map_chr(x, which_group, df.ranges))

The function which_group() assumes df has three columns which are the group labels, starting value, and ending value, in that order.

system · May 2, 2020, 5:18pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.