For each group get another group by comparing two other columns

Hi everyone,

I have a little problem which I simply cannot get solved. I have a dataframe where I want to compare to columns for each group. When the two columns match, I want to extract all the rows from a fourth colum that are one group as well. This sounds super strange, so I tried to create an example. It is not the best, but I think it shows the point more or less:)

set.seed(2)
dr = seq(as.Date("2020-01-01"),as.Date("2021-01-01"), by = "day")
df = data.frame(
  date.x = dr,
  date.y = sample(dr, size = length(dr), replace = T)
)
g1 = rep(seq(1,length(dr), by = 10), each = 10)
g2 = rep(seq(1,length(dr), by = 5), each = 5)
df["g1"] = g1[1:nrow(df)]
df["g2"] = g2[1:nrow(df)]
       date.x     date.y g1 g2
1  2020-01-01 2020-12-06  1  1
2  2020-01-02 2020-07-16  1  1
3  2020-01-03 2020-09-18  1  1
4  2020-01-04 2020-09-29  1  1
5  2020-01-05 2020-12-14  1  1
6  2020-01-06 2020-07-22  1  6
7  2020-01-07 2020-10-23  1  6
8  2020-01-08 2020-06-26  1  6
9  2020-01-09 2020-03-15  1  6
10 2020-01-10 2020-05-10  1  6

So I want to comple date.x and date.yfor each group of g1. When they match I want to look at the column g2and extract all the same values.
So I received the tip about doing it like this:

df %>% 
  group_by(g1, g2) %>% 
  filter(
    g2 == g2[date.x == date.y]
  )

But this does not work. Maybe someone has a tip here:) Thanks a lot already!

Hello,

For your description, it seems you want this:

library(dplyr)
df %>%
    group_by(g1, g2) %>%
    filter((date.x == date.y) & (g1 == g2))

However, it leads to no rows, as there is no row where date.x matches date.y and g1 matches g2. So I'm not sure whether I understood your requirement correctly or not.

I am sorry, I think my quesion is really bad. g1and g2do not need to be the same. I just want to check if date.x is the same as date.yfor each group of g1. And the problem is also that the filter(date.x == date.y)will always make that it only returns one row. But for example if date.x == date.yis TRUE than I want to return all the rows which have the same values in g2for the row where date.xand date.ymatch.

I am really sorry, I think it did not really make things clearer...

This is my understanding:

  1. you want to get g2 value of that row where date.x matches date.y
  2. then you want all rows where g2 has that same value.

If it's right, it'll do the job:

df[df["g2"] == df[df["date.x"] == df["date.y"], "g2"],]

But to be frank, I'm not confident I understood you. May be someone else will help you out.

1 Like

Thank you so much for your help! And sorry about my bad problem description...
I kind of figured it out. This does more or less what I want it to do:)

df %>% 
  group_by(g1, g2) %>% 
  mutate(
   new_var = ifelse(any(date.x == date.y), g2, NA) 
  ) %>% 
  filter(
    !is.na(new_var)
  )