Lag with multiple variables

Hello!

Trying to utilize lag and it works great with single variable. But when we add multiple variables, then I am not getting right answers. I am trying to add previous two values using lag here in a new column. Any help would be appreciated.

vol <- data.frame(
  Date = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  Country = c("CA","CA","CA","CA","CA","CA","US","US","US","US","US","US",
          "CA","CA","CA","CA","CA","CA","US","US","US","US","US","US",
            "CA","CA","CA","CA","CA","CA","US","US","US","US","US","US",
            "CA","CA","CA","CA","CA","CA","US","US","US","US","US","US",)
)
  Type = c("A", "B", "C", "D", "A", "B", "C", "D","A", "B", "C", "D",
"A", "B", "C", "D", "A", "B", "C", "D","A", "B", "C", "D",
"A", "B", "C", "D", "A", "B", "C", "D","A", "B", "C", "D",
"A", "B", "C", "D", "A", "B", "C", "D","A", "B", "C", "D",
)
  Sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

vol$Date <- ym(vol$Date)

vol <- vol%>%
 mutate(Date = ym(Date))%>%
group_by(Date, Country, Type)%>%
ungroup()%
summarize(Sales = sum(Sales)

vol <- vol%>%
mutate("sum_lag_2" = lag(Sales, 1) + lag(Sales, 2)

# Also tried the following with no luck
#group_by(Type)%>%
# mutate("sum_lag_2" = lag(Sales, 1, order_by = Type) + lag(Sales, 2, order_by = Type)

Thank you!

You should ungroup() AFTER summarize().

Also you should arrange() before using lag() to make sure the data is in the right order. You probably want to do the lag inside a group_by() as well, since you don't want to combine different kinds of sales.

How about this? The answers are all zero since you don't have any groups that are bigger than 1 row, so lag(,1) and lag(,2) are always 0.

vol <- vol %>%
  mutate(date = ym(date)) %>%
  group_by(date, Country, Type) %>%
  summarize(Sales = sum(Sales)) %>%
  arrange(date) %>%
  mutate(sum_lag_2 = lag(Sales, 1, default = 0) + lag(Sales, 2, default = 0)) %>%
  ungroup()

Thanks @woodward !

Yes, absolutely right! I would like to have lag within groups. But in this case, the values are getting added from two different Type

1 Like

So, I finally split the dataset into individual groups using split(). then the following works fine. Of course, I will need to recombine the data. But without splitting values don't come out right with different groups.

If there is a better solution, I would still like to know to avoid splitting and recombining data.

mutate("sum_lag_2" = lag(Sales, 1) + lag(Sales, 2)

Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.