Moving average and function to source question

Hi to all,
I am trying to add a moving mean average per two observations added to my code.
how do i do that?

also , i have 4 different excel files .
One file is named MonthlySales1.xls , the second file is named MonthlySales2.xls
They contain different numbers

still , when i run the code for MonthlySales2.xls , i get to see the results from MonthlySales1.xls
something is not right in my code....who can tell me what to amend?

**

load needed packages

library(readr)
library(dplyr)
library(ggplot2)
library(forecast)
library(ggplot2)

import data

sales <- readxl("C:\Users\jenny\Desktop\R Program\Forecast calcs\MonthlySales2.xls")

explore data

str(sales)

cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(2+1):length(cx)] - cx[1:(length(cx) - 2)]
rn <- cn[(2+1):length(cx)] - cn[1:(length(cx) - 2)]
rsum <- rx / rn

head(sales, n = 5)
Classes'tbl_df','tbl' and 'data.frame': 96 obs. of 2 variables:
month: Date, format: "1982-06-12" "1982-06-19" ... sales: num 14237 4520 55691 28295 23648 ...

  • attr(*, "spec")=
    .. cols(
    .. month = col_date(format = ""),
    .. sales = col_double()
    .. )
    options(repr.plot.width = 6, repr.plot.height = 3)
    ggplot(sales, aes(x = month, y = sales)) + geom_line() + geom_smooth(method = 'lm') +labs(x = "Time", y = "Monthly Sales")
    salesTS <- ts(sales$sales, frequency = 4, start = c(2019,1))
    class(salesTS)
    'ts'
    options(repr.plot.width = 6, repr.plot.height = 5)
    salesDecomp <- decompose(salesTS)
    autoplot(salesDecomp)

logging transform time series data

salesLog <- log(salesTS)

salesLogHW <- ets(salesLog)
salesLogHW
ets(x = salesLog)

sales <- sqrt(salesLogHW$dispersion)
dnorm(7, mean = coef(m1), sd = sdest)

salesLogHW <- HoltWinters(salesLog)
salesLogHW
HoltWinters(x = salesLog)

options(repr.plot.width = 6, repr.plot.height = 4)
autolayer(salesLogHW)

forecast next year's sales

nextYearSales <- forecast(salesLogHW, h=4)

plot

autolayer(nextYearSales)
nextYearSales
**

It's much better if you post a minimal reproducible example i.e. only the code required to illustrate the specific problem that you are encountering. Asking people to read through your entire code and find out what's wrong isn't a good approach (nor is it possible without a reprex).

Since you haven't provided a reprex, I'll illustrate how to calculate rolling means on the iris data set. You can adapt it to your sales data frame.

library(zoo, warn.conflicts = FALSE)

iris$Sepal.Length.Mean <- rollmean(iris$Sepal.Length, k = 2, fill = NA, align = "right")

head(iris[, c(1, 6)], n = 10)
#>    Sepal.Length Sepal.Length.Mean
#> 1           5.1                NA
#> 2           4.9              5.00
#> 3           4.7              4.80
#> 4           4.6              4.65
#> 5           5.0              4.80
#> 6           5.4              5.20
#> 7           4.6              5.00
#> 8           5.0              4.80
#> 9           4.4              4.70
#> 10          4.9              4.65

Created on 2020-04-30 by the reprex package (v0.3.0)

Note that the first value of Sepal.Length.Mean has been filled with NA. You control its value & position by changing the fill and align parameters respectively.

As for your second question, I think a good way to diagnose would be to run each line of code step-by-step after reading each Excel file and observe how the data frame is changing. That should help narrow down the issue.

1 Like

Thank you! That helps narrow it down as we can see that the error is thrown by readxl().

The problem is that your file paths contain \. This is a reserved character in R and you need to tell R to interpret it as a literal \ by "escaping" each \. So you need to modify your readxl() call to this:

readxl("C:\\Users\\jenny\\Desktop\\R Program\\Forecast calcs\\MonthlySales1.xls")

Or you could use / instead of \. With this approach you don't need to double the slashes.

readxl("C:/Users/jenny/Desktop/R Program/Forecast calcs/MonthlySales1.xls")
1 Like

Hi, great ,
so looking at your code above my code should look like this?


library(zoo, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(slider, warn.conflicts = FALSE)
library(readxl)

setwd('C:\Users\jenny\Desktop\R Program\Forecast calcs\MonthlySales1.xls')

readxl("C:\Users\jenny\Desktop\R Program\Forecast calcs\MonthlySales1.xls")

MonthlySales1 <- data.frame(sales1)
MonthlySales1$Sales.Mean <- rollmean(iris$Sales, k = 2, fill = 1, align = "right")

head(iris[, c(1, 6)], n = 94)


something particlar is going on,as it cant seem to connectto the xls file...

error says: Error in readxl("C:\Users\jenny\Desktop\R Program\Forecast calcs\MonthlySales1.xls") :
could not find function "readxl"

I feel such a dumbo ,it is probably a simple error i've made:-)

It sounds like you may be relatively new to using R. If so, I highly recommend the book R for Data Science which is available for free on the web: https://r4ds.had.co.nz/

As to your code, this is should accomplish what you're looking for:

library(readxl)
library(zoo)

MonthlySales1 <- read_excel("C:/Users/jenny/Desktop/R Program/Forecast calcs/MonthlySales1.xls")
MonthlySales1$Sales.Mean <- rollmean(MonthlySales1$sales, k = 2, fill = 1, align = "right")
MonthlySales1
1 Like

yeaeeeaah
this is a working version.thanks you so much. yes ,indeed i am still learning every day :slight_smile:
cheers

1 Like

Hi Matt,

I have an alpha,beta and gamma calculation in my code which calculates alpha,beta and gamma over the whole time series.
if i want to have an moving alpha , beta and gamma (ABG) calculated over every former 20 observations and have that i my matrix ,how can i rewrite that piece of code ?

I hope i have provided you enough information.
I know i am a beginner, so i have lots to learn :slight_smile:

as you now it's connected to MonthlySales1.xls

the lay out of MonthlySales1xls looks like this:

Month sales alpha beta gamma
2019-01-02 13
2019-01-05 4
2019-01-09 1
2019-01-12 10
2019-01-16 4
2019-01-19 2
2019-01-23 3
2019-01-26 3
2019-01-30 3
2019-02-02 14
2019-02-06 1
2019-02-09 1
2019-02-13 5
2019-02-16 3
2019-02-20 21
2019-02-23 5
2019-02-27 4
2019-03-02 4
2019-03-06 1
2019-03-09 22 ? / ? / ? (abg of former 20 obervations)
2019-03-13 4 ? / ? / ? (abg of former 20 obervations)
2019-03-16 9 ? / ? / ? (abg of former 20 obervations)
2019-03-20 2 ? / ? / ? (abg of former 20 obervations)
2019-03-23 3 etc.etc.
2019-03-27 16
2019-03-30 7
2019-04-03 12
2019-04-06 58
2019-04-10 10
2019-04-13 4
2019-04-17 10
2019-04-20 11
2019-04-24 6
2019-04-27 5
2019-05-01 23
2019-05-04 3
2019-05-08 58
2019-05-11 3
2019-05-15 5
2019-05-18 7
2019-05-22 13
2019-05-25 12
2019-05-29 4
2019-06-01 16
2019-06-05 6
2019-06-08 9
2019-06-12 22
2019-06-15 10
2019-06-19 21
2019-06-22 69
2019-06-26 4
2019-06-29 2
2019-07-03 13
2019-07-06 10
2019-07-10 7
2019-07-13 5
2019-07-17 21
2019-07-20 4
2019-07-24 2
2019-07-27 1
2019-07-31 2
2019-08-03 4
2019-08-07 5
2019-08-10 2
2019-08-14 69
2019-08-17 4
2019-08-21 2
2019-08-24 3
2019-08-28 1
2019-08-31 2
2019-09-04 1
2019-09-07 78
2019-09-11 2
2019-09-14 6
2019-09-18 3
2019-09-21 1
2019-09-25 16
2019-09-28 1
2019-10-02 20
2019-10-05 14
2019-10-09 5
2019-10-12 78
2019-10-16 2
2019-10-19 2
2019-10-23 11
2019-10-26 4
2019-10-30 23
2019-11-02 8
2019-11-06 6

my code looks like this:

load needed packages

library(readr)
library(dplyr)
library(ggplot2)
library(forecast)
library(readxl)
library(zoo)

import data

MonthlySales1 <- read_excel("C:/Users/jenny/Desktop/R Program/Forecast calcs/MonthlySales1.xls")

explore data

str(sales)

cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(2+1):length(cx)] - cx[1:(length(cx) - 2)]
rn <- cn[(2+1):length(cx)] - cn[1:(length(cx) - 2)]
rsum <- rx / rn

head(sales, n = 5)
Classes'tbl_df','tbl' and 'data.frame': 96 obs. of 9 variables:
month: Date, format: "2019-06-12" "2019-06-19" ... sales: num 14237 4520 55691 28295 23648 ...

  • attr(*, "spec")=
    .. cols(
    .. month = col_date(format = ""),
    .. sales = col_double()
    .. )
    options(repr.plot.width = 6, repr.plot.height = 3)
    ggplot(sales, aes(x = month, y = sales)) + geom_line() + geom_smooth(method = 'lm') +labs(x = "Time", y = "Monthly Sales")
    salesTS <- ts(sales$sales, frequency = 4, start = c(2019,1))
    class(salesTS)
    'ts'
    options(repr.plot.width = 6, repr.plot.height = 5)
    salesDecomp <- decompose(salesTS)
    autoplot(salesDecomp)

logging transform time series data

salesLog <- log(salesTS)

salesLogHW <- ets(salesLog)
salesLogHW
ets(x = salesLog)

sales <- sqrt(salesLogHW$dispersion)
dnorm(7, mean = coef(m1), sd = sdest)

salesLogHW <- HoltWinters(salesLog)
salesLogHW
HoltWinters(x = salesLog)

MonthlySales1 <- read_excel("C:/Users/jenny/Desktop/R Program/Forecast calcs/MonthlySales1.xls")
MonthlySales1$sales.mean<- rollmean(MonthlySales1$sales, k = 2, fill = 1, align = "right")
MonthlySales1

options(repr.plot.width = 6, repr.plot.height = 4)
autolayer(salesLogHW)

forecast next year's sales

nextYearSales <- forecast(salesLogHW, h=4)

plot

autolayer(nextYearSales)
nextYearSales


output:

library(dplyr)

load needed packages

library(readr)
library(dplyr)
library(ggplot2)
library(forecast)
library(readxl)
library(zoo)

import data

MonthlySales1 <- read_excel("C:/Users/jenny/Desktop/R Program/Forecast calcs/MonthlySales1.xls")

explore data

str(sales)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 3 variables:
sales : num 6 5 18 8 43 61 95 139 45 88 ... return : num 2 3 4 5 8 1 16 19 6 34 ...
$ AlphaHW: num 1 2 6 3 3 7 2 49 49 73 ...

  • attr(*, "spec")=
    .. cols(
    .. sales = col_double(),
    .. sales = col_double(),
    .. return = col_double()
    .. )

cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(2+1):length(cx)] - cx[1:(length(cx) - 2)]
rn <- cn[(2+1):length(cx)] - cn[1:(length(cx) - 2)]
rsum <- rx / rn

head(sales, n = 5)

A tibble: 5 x 3

sales return AlphaHW

1 6 2 1
2 5 3 2
3 18 4 6
4 8 5 3
5 43 8 3

Classes'tbl_df','tbl' and 'data.frame': 96 obs. of 9 variables:
Error: unexpected string constant in "Classes'tbl_df'"
month: Date, format: "2019-06-12" "2019-06-19" ... Error: unexpected '' in " " sales: num 14237 4520 55691 28295 23648 ...
Error: unexpected '' in ""

  • attr(, "spec")=
    Error: unexpected '
    ' in "- attr(*"
    .. cols(
    Error: unexpected symbol in " .. cols"
    .. month = col_date(format = ""),
    Error: unexpected symbol in " .. month"
    .. sales = col_double()
    Error: unexpected symbol in " .. sales"
    .. )
    Error: unexpected ')' in " .. )"
    options(repr.plot.width = 6, repr.plot.height = 3)
    ggplot(sales, aes(x = month, y = sales)) + geom_line() + geom_smooth(method = 'lm') +labs(x = "Time", y = "Monthly Sales")
    Error in FUN(X[[i]], ...) : object 'month' not found
    salesTS <- ts(sales$sales, frequency = 4, start = c(2019,1))
    class(salesTS)
    [1] "ts"
    'ts'
    [1] "ts"
    options(repr.plot.width = 6, repr.plot.height = 5)
    salesDecomp <- decompose(salesTS)
    autoplot(salesDecomp)

logging transform time series data

salesLog <- log(salesTS)

salesLogHW <- ets(salesLog)
salesLogHW
ETS(M,Ad,N)

Call:
ets(y = salesLog)

Smoothing parameters:
alpha = 0.1434
beta = 1e-04
phi = 0.8008

Initial states:
l = 1.4663
b = 0.7245

sigma: 0.3113

 AIC     AICc      BIC 

166.4996 169.0450 176.6328

ets(x = salesLog)
Error in class(y) %in% c("data.frame", "list", "matrix", "mts") :
argument "y" is missing, with no default

sales <- sqrt(salesLogHW$dispersion)
Error in sqrt(salesLogHW$dispersion) :
non-numeric argument to mathematical function
dnorm(7, mean = coef(m1), sd = sdest)
Error in coef(m1) : object 'm1' not found

salesLogHW <- HoltWinters(salesLog)
salesLogHW
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = salesLog)

Smoothing parameters:
alpha: 0.3636345
beta : 0.2096302
gamma: 0.4300818

Coefficients:
[,1]
a 3.8519386
b -0.1149221
s1 -0.6216912
s2 -0.3063579
s3 -1.0720958
s4 -0.3704739

HoltWinters(x = salesLog)
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = salesLog)

Smoothing parameters:
alpha: 0.3636345
beta : 0.2096302
gamma: 0.4300818

Coefficients:
[,1]
a 3.8519386
b -0.1149221
s1 -0.6216912
s2 -0.3063579
s3 -1.0720958
s4 -0.3704739

MonthlySales1 <- read_excel("C:/Users/jenny/Desktop/R Program/Forecast calcs/MonthlySales1.xls")
MonthlySales1$sales.mean<- rollmean(MonthlySales1$sales, k = 2, fill = 1, align = "right")
MonthlySales1

A tibble: 95 x 9

Month sales Forecast fc alpha fc beta

1 2019-01-02 00:00:00 13 NA NA NA
2 2019-01-05 00:00:00 4 NA NA NA
3 2019-01-09 00:00:00 1 NA NA NA
4 2019-01-12 00:00:00 10 NA NA NA
5 2019-01-16 00:00:00 4 NA NA NA
6 2019-01-19 00:00:00 2 NA NA NA
7 2019-01-23 00:00:00 3 NA NA NA
8 2019-01-26 00:00:00 3 NA NA NA
9 2019-01-30 00:00:00 3 NA NA NA
10 2019-02-02 00:00:00 14 NA NA NA

... with 85 more rows, and 4 more variables: `fc

gamma` , Low , High , sales.mean

options(repr.plot.width = 6, repr.plot.height = 4)
autolayer(salesLogHW)
Error: Objects of type HoltWinters not supported by autolayer.
Run rlang::last_error() to see where the error occurred.

forecast next year's sales

nextYearSales <- forecast(salesLogHW, h=4)

plot

autolayer(nextYearSales)
mapping: x = ~x, y = ~y, level = ~level, ymin = ~ymin, ymax = ~ymax
geom_forecast: na.rm = FALSE
stat_identity: na.rm = FALSE
position_identity
nextYearSales
Point Forecast Lo 80 Hi 80 Lo 95
2029 Q1 3.115325 1.2080498 5.022601 0.1983992
2029 Q2 3.315737 1.2321049 5.399368 0.1290970
2029 Q3 2.435076 0.1306404 4.739513 -1.0892543
2029 Q4 3.021776 0.4553214 5.588231 -0.9032777
Hi 95
2029 Q1 6.032251
2029 Q2 6.502376
2029 Q3 5.959407
2029 Q4 6.946830

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Hi, thank you for your support.
where in the code can i place the code for the moving average ?

bte
My excel file Monthlysales1.xls looks like this.Its very basic.
(it should pull data only from :C:\Users\jenny\Desktop\R Program\Forecast calcs\MonthlySales1.xls

it shouldnt take any other data.

Month sales
02/01/2019 12
05/01/2019 2
09/01/2019 6
12/01/2019 8
16/01/2019 9
19/01/2019 45
23/01/2019 69
26/01/2019 23
30/01/2019 23
02/02/2019 56
06/02/2019 56
09/02/2019 238
13/02/2019 89
16/02/2019 9
20/02/2019 9
23/02/2019 30
27/02/2019 23
02/03/2019 56
06/03/2019 56
09/03/2019 12
13/03/2019 128
16/03/2019 3
20/03/2019 3
23/03/2019 3
27/03/2019 5
30/03/2019 6
03/04/2019 9
06/04/2019 9
10/04/2019 9
13/04/2019 9
17/04/2019 9
20/04/2019 9
24/04/2019 9
27/04/2019 9
01/05/2019 58.2
04/05/2019 2
08/05/2019 4
11/05/2019 4
15/05/2019 5
18/05/2019 5
22/05/2019 5
25/05/2019 5
29/05/2019 5
01/06/2019 5
05/06/2019 5
08/06/2019 5
12/06/2019 8
15/06/2019 8
19/06/2019 89
22/06/2019 96
26/06/2019 8
29/06/2019 7
03/07/2019 7
06/07/2019 8
10/07/2019 9
13/07/2019 5
17/07/2019 5
20/07/2019 7
24/07/2019 7
27/07/2019 8
31/07/2019 6
03/08/2019 65
07/08/2019 52
10/08/2019 44
14/08/2019 7
17/08/2019 7
21/08/2019 8
24/08/2019 8
28/08/2019 6
31/08/2019 2
04/09/2019 25
07/09/2019 4
11/09/2019 7
14/09/2019 7
18/09/2019 898
21/09/2019 6
25/09/2019 63
28/09/2019 2
02/10/2019 4
05/10/2019 7
09/10/2019 8
12/10/2019 13

I will l try to make a reprex and send that over.

What is the name of the data frame that holds this table? That name should be used in place of iris.

The name of the variable over which the rolling mean should be calculated (which appears to be sales from your previous post) should be used instead of Sepal.Length.

Since I've recently started using it, I thought I would add another way to create a moving average column using the slider package. If you worked with purrr before, the slider API is very familiar. So here is another approach that is equivalen to @siddharthprabhu's code.

library(dplyr, warn.conflicts = FALSE)
library(slider, warn.conflicts = FALSE)

iris %>% 
  mutate(Sepal.Length.Mean = slide_dbl(Sepal.Length, ~ mean(.x),
                                       .before = 1, .complete = TRUE)) %>% 
  select(1, 6) %>% 
  head(10)
#>    Sepal.Length Sepal.Length.Mean
#> 1           5.1                NA
#> 2           4.9              5.00
#> 3           4.7              4.80
#> 4           4.6              4.65
#> 5           5.0              4.80
#> 6           5.4              5.20
#> 7           4.6              5.00
#> 8           5.0              4.80
#> 9           4.4              4.70
#> 10          4.9              4.65

Created on 2020-04-30 by the reprex package (v0.3.0)

1 Like

I have tested it. Being a learner , I have the following code:

Input
library(zoo, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(slider, warn.conflicts = FALSE)

readxl("C:\Users\jenny\Desktop\R Program\Forecast calcs\MonthlySales1.xls")
MonthlySales1 <- data.frame(sales1)
MonthlySales1$Sales.Mean <- rollmean(iris$Sales, k = 2, fill = 1, align = "right")

head(iris[, c(1, 6)], n = 94)

output

i have some error messages, whats are my errors here?

readxl("C:\Users\jenny\Desktop\R Program\Forecast calcs\MonthlySales1.xls")
Error: '\U' used without hex digits in character string starting ""C:\U"
MonthlySales1 <- data.frame(sales1)
MonthlySales1$Sales.Mean <- rollmean(iris$Sales, k = 2, fill = 1, align = "right")

head(iris[, c(1, 6)], n = 94)

Sepal.Length Sepal.Length.Mean
1 5.1 NA
2 4.9 5.00
3 4.7 4.80
4 4.6 4.65
5 5.0 4.80
6 5.4 5.20
7 4.6 5.00
8 5.0 4.80
9 4.4 4.70
10 4.9 4.65
11 5.4 5.15
12 4.8 5.10
13 4.8 4.80

thanks
I will let you know if it works