Help with Sum of last 10 years

Hello,

I am interested in getting sum of last 10 years sales for any selected year and month. I am struggling to do this without doing it manually. Can you please help?

Below example has been reduced for understanding purpose and in this case, I am displaying 4 years' worth data, and we would like to have sum of previous two years sales for any year and month.

library(tidyverse)
library(lubridate)

df <- data.frame(
          month = c("2018 Jan",
                    "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
                    "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
                    "2018 Dec","2019 Jan",
                    "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
                    "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
                    "2019 Dec","2020 Jan",
                    "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
                    "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
                    "2020 Dec", "2021 Jan",
                    "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
                    "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
                    "2021 Dec"),
          sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
                             220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
                             340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
                             460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

In this example, if I want to look at Dec 2021, the sum of last 2 years sales should be 10920 and for Jan 2020, it should be 5400. This should be applicable for any selected year and month. We can apply the same concept to get sales of previous 10 years.

Thanks in advance for helping with this!

Here is one way to do the calculation. A function could be written to take the date as an input and return the sum.

df <- data.frame(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
df$Date <- ym(df$month)
ChosenDate <- ymd("2021-12-01")
TwoYearOffset <- ChosenDate %m+% months(-23)
tmp <- df |> filter(Date >= TwoYearOffset, Date <= ChosenDate)
sum(tmp$sales)
#> [1] 10920

ChosenDate <- ymd("2020-01-01")
TwoYearOffset <- ChosenDate %m+% months(-23)
tmp <- df |> filter(Date >= TwoYearOffset, Date <= ChosenDate)
sum(tmp$sales)
#> [1] 5400

Created on 2021-12-06 by the reprex package (v2.0.1)

Update: replaces malformed DF object, and omits suggested soln, based on misunderstanding of question.

The error

Error in df$month: object of type 'closure' is not subsettable

from the earlier post illustrates the reason to avoid df, data and other built-ins when naming objects.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})
# don't use reserved names like df or data; it can cause problems 
DF <- data.frame(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

DF$month <- ym(DF$month)
DF
#>         month sales
#> 1  2018-01-01   100
#> 2  2018-02-01   110
#> 3  2018-03-01   120
#> 4  2018-04-01   130
#> 5  2018-05-01   140
#> 6  2018-06-01   150
#> 7  2018-07-01   160
#> 8  2018-08-01   170
#> 9  2018-09-01   180
#> 10 2018-10-01   190
#> 11 2018-11-01   200
#> 12 2018-12-01   210
#> 13 2019-01-01   220
#> 14 2019-02-01   230
#> 15 2019-03-01   240
#> 16 2019-04-01   250
#> 17 2019-05-01   260
#> 18 2019-06-01   270
#> 19 2019-07-01   280
#> 20 2019-08-01   290
#> 21 2019-09-01   300
#> 22 2019-10-01   310
#> 23 2019-11-01   320
#> 24 2019-12-01   330
#> 25 2020-01-01   340
#> 26 2020-02-01   350
#> 27 2020-03-01   360
#> 28 2020-04-01   370
#> 29 2020-05-01   380
#> 30 2020-06-01   390
#> 31 2020-07-01   400
#> 32 2020-08-01   410
#> 33 2020-09-01   420
#> 34 2020-10-01   430
#> 35 2020-11-01   440
#> 36 2020-12-01   450
#> 37 2021-01-01   460
#> 38 2021-02-01   470
#> 39 2021-03-01   480
#> 40 2021-04-01   490
#> 41 2021-05-01   500
#> 42 2021-06-01   510
#> 43 2021-07-01   520
#> 44 2021-08-01   530
#> 45 2021-09-01   540
#> 46 2021-10-01   550
#> 47 2021-11-01   560
#> 48 2021-12-01   570

Thanks @FJCC ! I would like to have a separate column that can show last 10 years values per row. So, we don't manually select a year and a month, but should be able to see sum of last 2 years data per year and month in this example.

Thanks @technocrat ! cumsum() might not work as it takes all the previous values into account. I would like to sum last 10 years values where data is monthly so year and month will still be considered. In the above example, we want to add last 2 years worth data for each row. cumsum() takes all previous years instead of adding just last 2 years plus I still want to retain months. So, for example, I would like to see last 2 years values for 2021 Aug.

Its definitely very tricky for me. Any help would be appreciated.

Thank you!

My results are off from yours, and I'm not sure why.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})
# don't use reserved names like df or data; it can cause problems 
DF <- data.frame(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

DF$month <- ym(DF$month)


find_prev_2yr <- function(x,y,z) {
  meas_date = ymd(paste(y,z,1,sep="-"))
  start = meas_date - dyears(2)
  prior_two = x[which(x$month >= start & x$month < meas_date),]
  return(sum(prior_two$sales))
}
  
find_prev_2yr(DF,2020,1)
#> [1] 5160
find_prev_2yr(DF,2021,12)
#> [1] 10350

Thanks @technocrat ! This is very helpful!
I believe mine is different because I was starting the addition from the selected year and month's value. If I wanted to include the selected date, then how would we modify your function?
Thank you so much!

Just this line

prior_two = x[which(x$month >= start & x$month <= meas_date),]

Check me on 2020-01, I still differ.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})
# don't use reserved names like df or data; it can cause problems 
DF <- data.frame(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

DF$month <- ym(DF$month)


find_prev_2yr <- function(x,y,z) {
  meas_date = ymd(paste(y,z,1,sep="-"))
  start = meas_date - dyears(2)
  prior_two = x[which(x$month >= start & x$month <= meas_date),]
  return(sum(prior_two$sales))
}

find_prev_2yr(DF,2020,1)
#> [1] 5500
find_prev_2yr(DF,2021,12)
#> [1] 10920

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

This can be solved using a combination of the clock and slider packages.

library(tidyverse)
library(clock)
library(slider)

df <- tibble(
  month = c("2018 Jan",
            "2018 Feb","2018 Mar","2018 Apr","2018 May","2018 Jun",
            "2018 Jul","2018 Aug","2018 Sep","2018 Oct","2018 Nov",
            "2018 Dec","2019 Jan",
            "2019 Feb","2019 Mar","2019 Apr","2019 May","2019 Jun",
            "2019 Jul","2019 Aug","2019 Sep","2019 Oct","2019 Nov",
            "2019 Dec","2020 Jan",
            "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
            "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
            "2020 Dec", "2021 Jan",
            "2021 Feb","2021 Mar","2021 Apr","2021 May","2021 Jun",
            "2021 Jul","2021 Aug","2021 Sep","2021 Oct","2021 Nov",
            "2021 Dec"),
  sales = c(100,110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
            220, 230,240, 250, 260, 270, 280, 290, 300, 310, 320, 330,
            340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450,
            460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570)
)

# Convert the character `month` column into a "year-month" column using
# `clock::year_month_day_parse()`
df <- df %>%
  mutate(month = year_month_day_parse(month, format = "%Y %B", precision = "month"))

head(df)
#> # A tibble: 6 × 2
#>   month        sales
#>   <ymd<month>> <dbl>
#> 1 2018-01        100
#> 2 2018-02        110
#> 3 2018-03        120
#> 4 2018-04        130
#> 5 2018-05        140
#> 6 2018-06        150

# This says "take the current month and 23 months of data before it".
# This adds up to 24 months total, and matches what you say you are expecting as answers.
before <- duration_months(23)

# Compute a rolling sum using `slider::slide_index_sum()`.
# The "index" in the name means that we compute this relative to an index,
# which here is the `month` column. We compute the window as [i - before, i].
# `complete = TRUE` means that we don't compute partial sums (i.e. it is why
# you get the NAs at the beginning).
df <- df %>%
  mutate(
    rolling = slide_index_sum(
      x = sales, 
      i = month, 
      before = before, 
      complete = TRUE
    )
  )

print(df, n = Inf)
#> # A tibble: 48 × 3
#>    month        sales rolling
#>    <ymd<month>> <dbl>   <dbl>
#>  1 2018-01        100      NA
#>  2 2018-02        110      NA
#>  3 2018-03        120      NA
#>  4 2018-04        130      NA
#>  5 2018-05        140      NA
#>  6 2018-06        150      NA
#>  7 2018-07        160      NA
#>  8 2018-08        170      NA
#>  9 2018-09        180      NA
#> 10 2018-10        190      NA
#> 11 2018-11        200      NA
#> 12 2018-12        210      NA
#> 13 2019-01        220      NA
#> 14 2019-02        230      NA
#> 15 2019-03        240      NA
#> 16 2019-04        250      NA
#> 17 2019-05        260      NA
#> 18 2019-06        270      NA
#> 19 2019-07        280      NA
#> 20 2019-08        290      NA
#> 21 2019-09        300      NA
#> 22 2019-10        310      NA
#> 23 2019-11        320      NA
#> 24 2019-12        330    5160
#> 25 2020-01        340    5400
#> 26 2020-02        350    5640
#> 27 2020-03        360    5880
#> 28 2020-04        370    6120
#> 29 2020-05        380    6360
#> 30 2020-06        390    6600
#> 31 2020-07        400    6840
#> 32 2020-08        410    7080
#> 33 2020-09        420    7320
#> 34 2020-10        430    7560
#> 35 2020-11        440    7800
#> 36 2020-12        450    8040
#> 37 2021-01        460    8280
#> 38 2021-02        470    8520
#> 39 2021-03        480    8760
#> 40 2021-04        490    9000
#> 41 2021-05        500    9240
#> 42 2021-06        510    9480
#> 43 2021-07        520    9720
#> 44 2021-08        530    9960
#> 45 2021-09        540   10200
#> 46 2021-10        550   10440
#> 47 2021-11        560   10680
#> 48 2021-12        570   10920
2 Likes