Businesses with consecutive sales

Hello,
How do I get the number of businesses that had sales for 2 consecutive years, then 3 consecutive years, then four consecutive years?
Here's an example of what I'm working with

Year <- c(2013, 2012, 2011, 2010, 2013, 2012, 2011, 2010, 2013, 2012, 2011, 2010)
BusinessID <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
Sales <- c(4, 10, 3, 0, 1, 0, 0, 10, 4, 2, 4, 12)
df <- data.frame(Year, BusinessID, Sales)

> df
   Year BusinessID Sales
1  2010          1     4
2  2011          1    10
3  2012          1     3
4  2013          1     0
5  2010          2     1
6  2011          2     0
7  2012          2     0
8  2013          2    10
9  2010          3     4
10 2011          3     2
11 2012          3     4
12 2013          3    12

The optimal result would be a variable with the IDs of those businesses. Similar to what we get in a which() function.
For this example the end result should be something like:

> consecutive_years_2
NULL

> consecutive_years_3
[1] 1

> consecutive_years_4
[1] 3

Then eventually I would also want for example, "more or equal to 3 consecutive years" etc. Which would give:

> more_or_equal_to_consecutive_years_3
[1] 1 3

Try building on this

suppressPackageStartupMessages({
  library(dplyr)
})
Year <- c(2013, 2012, 2011, 2010, 2013, 2012, 2011, 2010, 2013, 2012, 2011, 2010)
BusinessID <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
Sales <- c(4, 10, 3, 0, 1, 0, 0, 10, 4, 2, 4, 12)
DF <- data.frame(Year, BusinessID, Sales)

DF <- DF %>% arrange(BusinessID,Year) %>%
  mutate(seq2 = ifelse(Sales > 0 & lag(Sales,1) > 0, 1,NA))

DF
#>    Year BusinessID Sales seq2
#> 1  2010          1     0   NA
#> 2  2011          1     3   NA
#> 3  2012          1    10    1
#> 4  2013          1     4    1
#> 5  2010          2    10    1
#> 6  2011          2     0   NA
#> 7  2012          2     0   NA
#> 8  2013          2     1   NA
#> 9  2010          3    12    1
#> 10 2011          3     4    1
#> 11 2012          3     2    1
#> 12 2013          3     4    1
1 Like

technocrat, you have saved the day again!

I hope this helps other people too.
What I did is, changed the NA to 0 and added another condition so that it doesn't count a consecutive from 2013 to 2010 (i.e. different Businesses). It looked like this:

DF <- df %>% arrange(BusinessID,Year) %>%
  mutate(seq2 = ifelse(Sales > 0 & lag(Sales,1) > 0 & Year-lag(Year,1)==1, 1,0))

> DF
   Year BusinessID Sales seq2
1  2010          1     0    0
2  2011          1     3    0
3  2012          1    10    1
4  2013          1     4    1
5  2010          2    10    0
6  2011          2     0    0
7  2012          2     0    0
8  2013          2     1    0
9  2010          3    12    0
10 2011          3     4    1
11 2012          3     2    1
12 2013          3     4    1

But then this is counting one less for each business I substituted the last one for the value 2 (with some conditions) and then aggregated them:

for (i in 1:NROW(DF)) {
 if (DF$Year[i]==2013 && DF$Sales[i]>0 && DF$Sales[i-1]>0){
   DF$seq2[i]<-2
   }
}

> DF
  Year BusinessID Sales seq2
1  2010          1     0    0
2  2011          1     3    0
3  2012          1    10    1
4  2013          1     4    2
5  2010          2    10    0
6  2011          2     0    0
7  2012          2     0    0
8  2013          2     1    0
9  2010          3    12    0
10 2011          3     4    1
11 2012          3     2    1
12 2013          3     4    2

Then aggregation:

Answer <- aggregate(DF$seq2, by=list(BusinessID=DF$BusinessID), FUN=sum)

> Answer
  BusinessID x
1          1 3
2          2 0
3          3 4

Now I can use subset:

Names_3 <- subset(Answer, x >= 3)

> Names_3
  BusinessID x
1          1 3
3          3 4

> NROW(Names_3)
[1] 2

I have my answer!
Thank you.

2 Likes