Create Dummy variable in R according to whether variable appears multiple times within certain time frame

Hi, I have troubles creating a dummy variable in R according to whether variables appear multiple times in a dataset within a certain time frame. I tried already various commands but none of them actually delivers the result I need...

I tried to simplify the two tables I use to illustrate the problem: The first one contains (beside some other variables) the columns "year" and company "ID". The second one is a subset of the first one with the same structure . Table 1 looks like:

YEAR COMPANY ID

2004 x

2004 y

2005 a

2005 z

2006 x

2006 a

2007 y

2008 b

The second one looks like this:

YEAR COMPANY ID

2004 x

2004 y

2005 a

2006 x

2008 b

I need a dummy variable as new column in the second table that checks for every Company ID (i.e. every row) if this ID appears in the first table multiple times within a time frame of 3 years starting with the year indicated in the same row as the company ID in the second table. e.g. first row table2: dummy =1, when company x appears multiple times in table 1 during the years 2004-2006. If not the dummy shall be equal to 0.

Is there any option how to create this dummy? Thanks for your help!

This would be one way to do it (I'm assuming that by "multiple times" you mean more than 1)

library(dplyr)
library(fuzzyjoin)

df1 <- data.frame(stringsAsFactors=FALSE,
         YEAR = c(2004L, 2004L, 2005L, 2005L, 2006L, 2006L, 2007L, 2008L),
   COMPANY_ID = c("x", "y", "a", "z", "x", "a", "y", "b")
)

df2 <- data.frame(stringsAsFactors=FALSE,
         YEAR = c(2004L, 2004L, 2005L, 2006L, 2008L),
   COMPANY_ID = c("x", "y", "a", "x", "b")
)

df2 %>% 
    mutate(end_year = YEAR + 2) %>%  
    fuzzy_left_join(df1,
                    by = c("YEAR" = "YEAR",
                           "end_year" = "YEAR",
                           "COMPANY_ID" = "COMPANY_ID"),
                    match_fun = list(`<=`, `>=`, `==`)) %>% 
    count(YEAR = YEAR.x, COMPANY_ID = COMPANY_ID.x, name = "dummy") %>%
    mutate(dummy = if_else(dummy > 1, 1, 0))
#> # A tibble: 5 x 3
#>    YEAR COMPANY_ID dummy
#>   <int> <chr>      <dbl>
#> 1  2004 x              1
#> 2  2004 y              0
#> 3  2005 a              1
#> 4  2006 x              0
#> 5  2008 b              0

Created on 2019-11-25 by the reprex package (v0.3.0.9000)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.