I need to combine overlapping years, if there is no more than 1 year gap between periods. If there is two or more years gap I need to keep a separate records for this periods. Example data look like that.
client<- c(1,1,1,1,1,2,2,3,3)
year_from<-c(1993, 1998, 2000,2001, 2008,2007,2007,2006,2020)
year_to<-c(1995, 2020,2005, 2021, 2010, 2010, 2014,2017, 2020)
data<- tibble(client,year_from, year_to)
A tibble: 9 x 3
client year_from year_to
<dbl> <dbl> <dbl>
1 1 1993 1995
2 1 1998 2020
3 1 2000 2005
4 1 2001 2021
5 1 2008 2010
6 2 2007 2010
7 2 2007 2014
8 3 2006 2017
9 3 2020 2020
What I need to get this output:
A tibble: 4 x 3
client year_from year_to
<dbl> <dbl> <dbl>
1 1 1993 1995
2 1 1998 2021
3 2 2007 2014
4 3 2006 2020
What I tried is to create a "dif "column as below.
data %>% group_by(client) %>% mutate(dif=(year_from)-lag(year_to))
A tibble: 11 x 4
# Groups: client [3]
client year_from year_to dif
<dbl> <dbl> <dbl> <dbl>
1 1 1993 1995 NA
2 1 1998 2020 3
3 1 2000 2005 -20
4 1 2008 2021 3
5 1 2008 2010 -13
6 2 2007 2010 NA
7 2 2007 2014 -3
8 2 2007 2020 -7
9 3 2006 2017 NA
10 3 2020 2020 3
11 3 2023 2025 3
After that I have to write a condition that:
- if dif>=2 and year_from corresponding to that row is NOT BETWEEN (year_from , year_to) of any other rows for the same client then keep that record separately
- otherwise collapse rows with min of "year_from" and max of "year_to."
I would appreciate any help. Thanks.