How to find all possible date combinations for each group to find values' sum equal to zero

Hello!

I have a dataframe that looks like this:

 group    date     value
    a   2020-06-01   1
    a   2020-06-02   1
    a   2020-06-03  -1
    b   2020-06-01   2
    b   2020-06-02  -1
    b   2020-06-03  -1
    c   2020-06-02  -3
    c   2020-06-03   3
    c   2020-06-04   2

For each date combination I want to find the groups that the value's sum is equal to zero.
Note that I don't want to sum values across different groups.
For exemple, using this dataframe, I would like an output that returns me this information:

2020-06-01 and 2020-06-03: a
2020-06-02 and 2020-06-03: a, c
2020-06-01 and 2020-06-02 and 2020-06-03: b

Thanks for your help!

library(tidyverse)
library(lubridate)
(df<- tribble(~group ,   ~date     ,~value,
"a",  " 2020-06-01",   1,
"a",  " 2020-06-02",   1,
"a",  " 2020-06-03",  -1,
"b",  " 2020-06-01",   2,
"b",  " 2020-06-02",  -1,
"b",  " 2020-06-03",  -1,
"c",  " 2020-06-02",  -3,
"c",  " 2020-06-03",   3,
"c",  " 2020-06-04",   2) %>% mutate(date=ymd(date)))


#date combinations
uniqs <- unique(df$date)


combs <- map(1:length(uniqs),
    ~combn(x = uniqs,
           m=.,
           simplify = FALSE
    )) %>% flatten


solutions1 <- map(seq_along(combs),
   ~ {filter(df,date %in% combs[[.x]]) %>% group_by(
         group)  %>% summarise(value=sum(value)) %>% 
  filter(value==0) %>% pull(group)} %>% paste0(collapse=" & ")) %>% as.character()

combs_char <- map(combs,
                 ~ paste(format.Date(.,"%Y-%m-%d"),collapse=" & "))
names(solutions1) <-combs_char

tidy_solutions <- enframe(solutions1)

(tidy_solutions_min <- filter(tidy_solutions,
                             value != ""))
# # A tibble: 6 x 2
# name                                              value
# <chr>                                             <chr>
# 1 2020-06-01 & 2020-06-03                           a    
# 2 2020-06-02 & 2020-06-03                           a & c
# 3 2020-06-01 & 2020-06-02 & 2020-06-03              b & c
# 4 2020-06-01 & 2020-06-03 & 2020-06-04              a    
# 5 2020-06-02 & 2020-06-03 & 2020-06-04              a    
# 6 2020-06-01 & 2020-06-02 & 2020-06-03 & 2020-06-04 b

hmmm possibly needs more cleaning up as C is satisfied for 2020-06-02 & 2020-06-03 and
there is no value for C for 2020-06-01 which falsely allows 2020-06-01 & 2020-06-02 & 2020-06-03 to be rated for c

this change seems to do it; require the number of dates to match

solutions1 <- map(seq_along(combs),
   ~ {filter(df,date %in% combs[[.x]]) %>% group_by(
         group)  %>% summarise(value=sum(value),n=n(),num_dates=length( combs[[.x]])) %>% 
  filter(value==0,
         n==num_dates) %>% pull(group)} %>% paste0(collapse=" & ")) %>% as.character()

combs_char <- map(combs,
                 ~ paste(format.Date(.,"%Y-%m-%d"),collapse=" & "))
names(solutions1) <-combs_char

tidy_solutions <- enframe(solutions1)

(tidy_solutions_min <- filter(tidy_solutions,
                             value != ""))
# A tibble: 3 x 2
  name                                 value
  <chr>                                <chr>
1 2020-06-01 & 2020-06-03              a    
2 2020-06-02 & 2020-06-03              a & c
3 2020-06-01 & 2020-06-02 & 2020-06-03 b    

Awesome!
Thats's exactly what I need.

Thank you very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.