Help with Sum of last 10 years

Hello,

I am interested in getting sum of last 10 years sales for any selected year and month. I am struggling to do this without doing it manually. Can you please help?

Below example has been reduced for understanding purpose and in this case, I am displaying 4 years' worth data, and we would like to have sum of previous two years sales for any year and month.

library(tidyverse)
library(lubridate)

df <- data.frame(
          month = c("2018 Jan",
                    "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
                    "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
                    "2018 Dec","2019 Jan",
                    "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
                    "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
                    "2019 Dec","2020 Jan",
                    "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
                    "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
                    "2020 Dec", "2021 Jan",
                    "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
                    "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
                    "2021 Dec"),
          sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
                             220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
                             340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
                             460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

In this example, if I want to look at Dec 2021, the sum of last 2 years sales should be 10920 and for Jan 2020, it should be 5400. This should be applicable for any selected year and month. We can apply the same concept to get sales of previous 10 years.

Thanks in advance for helping with this!

Here is one way to do the calculation. A function could be written to take the date as an input and return the sum.

df <- data.frame(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
df$Date <- ym(df$month)
ChosenDate <- ymd("2021-12-01")
TwoYearOffset <- ChosenDate %m+% months(-23)
tmp <- df |> filter(Date >= TwoYearOffset, Date <= ChosenDate)
sum(tmp$sales)
#> [1] 10920

ChosenDate <- ymd("2020-01-01")
TwoYearOffset <- ChosenDate %m+% months(-23)
tmp <- df |> filter(Date >= TwoYearOffset, Date <= ChosenDate)
sum(tmp$sales)
#> [1] 5400

Created on 2021-12-06 by the reprex package (v2.0.1)

Update: replaces malformed DF object, and omits suggested soln, based on misunderstanding of question.

The error

Error in df$month: object of type 'closure' is not subsettable

from the earlier post illustrates the reason to avoid df, data and other built-ins when naming objects.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})
# don't use reserved names like df or data; it can cause problems 
DF <- data.frame(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

DF$month <- ym(DF$month)
DF
#>         month sales
#> 1  2018-01-01   100
#> 2  2018-02-01   110
#> 3  2018-03-01   120
#> 4  2018-04-01   130
#> 5  2018-05-01   140
#> 6  2018-06-01   150
#> 7  2018-07-01   160
#> 8  2018-08-01   170
#> 9  2018-09-01   180
#> 10 2018-10-01   190
#> 11 2018-11-01   200
#> 12 2018-12-01   210
#> 13 2019-01-01   220
#> 14 2019-02-01   230
#> 15 2019-03-01   240
#> 16 2019-04-01   250
#> 17 2019-05-01   260
#> 18 2019-06-01   270
#> 19 2019-07-01   280
#> 20 2019-08-01   290
#> 21 2019-09-01   300
#> 22 2019-10-01   310
#> 23 2019-11-01   320
#> 24 2019-12-01   330
#> 25 2020-01-01   340
#> 26 2020-02-01   350
#> 27 2020-03-01   360
#> 28 2020-04-01   370
#> 29 2020-05-01   380
#> 30 2020-06-01   390
#> 31 2020-07-01   400
#> 32 2020-08-01   410
#> 33 2020-09-01   420
#> 34 2020-10-01   430
#> 35 2020-11-01   440
#> 36 2020-12-01   450
#> 37 2021-01-01   460
#> 38 2021-02-01   470
#> 39 2021-03-01   480
#> 40 2021-04-01   490
#> 41 2021-05-01   500
#> 42 2021-06-01   510
#> 43 2021-07-01   520
#> 44 2021-08-01   530
#> 45 2021-09-01   540
#> 46 2021-10-01   550
#> 47 2021-11-01   560
#> 48 2021-12-01   570

Thanks @FJCC ! I would like to have a separate column that can show last 10 years values per row. So, we don't manually select a year and a month, but should be able to see sum of last 2 years data per year and month in this example.

Thanks @technocrat ! cumsum() might not work as it takes all the previous values into account. I would like to sum last 10 years values where data is monthly so year and month will still be considered. In the above example, we want to add last 2 years worth data for each row. cumsum() takes all previous years instead of adding just last 2 years plus I still want to retain months. So, for example, I would like to see last 2 years values for 2021 Aug.

Its definitely very tricky for me. Any help would be appreciated.

Thank you!

My results are off from yours, and I'm not sure why.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})
# don't use reserved names like df or data; it can cause problems 
DF <- data.frame(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

DF$month <- ym(DF$month)


find_prev_2yr <- function(x,y,z) {
  meas_date = ymd(paste(y,z,1,sep="-"))
  start = meas_date - dyears(2)
  prior_two = x[which(x$month >= start & x$month < meas_date),]
  return(sum(prior_two$sales))
}
  
find_prev_2yr(DF,2020,1)
#> [1] 5160
find_prev_2yr(DF,2021,12)
#> [1] 10350

Thanks @technocrat ! This is very helpful!
I believe mine is different because I was starting the addition from the selected year and month's value. If I wanted to include the selected date, then how would we modify your function?
Thank you so much!

Just this line

prior_two = x[which(x$month >= start & x$month <= meas_date),]

Check me on 2020-01, I still differ.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})
# don't use reserved names like df or data; it can cause problems 
DF <- data.frame(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

DF$month <- ym(DF$month)


find_prev_2yr <- function(x,y,z) {
  meas_date = ymd(paste(y,z,1,sep="-"))
  start = meas_date - dyears(2)
  prior_two = x[which(x$month >= start & x$month <= meas_date),]
  return(sum(prior_two$sales))
}

find_prev_2yr(DF,2020,1)
#> [1] 5500
find_prev_2yr(DF,2021,12)
#> [1] 10920

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.