Calculate a Rolling Slope In R

Hi
Im trying to calculate a rolling 7 day slope.
I am using a slope function.

rollingSlope.lm.fit <- function(vector) {
a <- coef(.lm.fit(cbind(1, seq(vector)), vector))[2]
return(a)
}

data$seven_day_slope<-rollapply(data$variable width=7, FUN=rollingSlope.lm.fit,fill=NA)

When trying to calculate slope for rolling 7 days.
My Dataset is 46 rows. 2 Columns Variable and date.
Calculating rolling slope results in Nas, for top 3 and bottom 3 values in dataset.
Id expect oldest 6 values to. be NAs as they are used to calculate the first 7 day value.

Has anyone had a similiar encounter?

rollapply() defaults to a centered alignment. You want a right alignment. You can get that by specifying align = "right" or by using rollapplyr(), which they added because this was so common.

library(zoo, warn.conflicts = FALSE)

x <- 1:10

rollapply(x, width = 7, FUN = mean, fill = NA)
#>  [1] NA NA NA  4  5  6  7 NA NA NA

rollapply(x, width = 7, FUN = mean, fill = NA, align = "right")
#>  [1] NA NA NA NA NA NA  4  5  6  7

rollapplyr(x, width = 7, FUN = mean, fill = NA)
#>  [1] NA NA NA NA NA NA  4  5  6  7

I would also encourage you to look at {slider} (I wrote it), which has a few more features than rollapply(). For instance, you can roll rowwise over entire data frames, and return more complex objects, like the entire lm object. This way you don't have to just return the coefficient value if you are also interested in other components of the model.

Here is a 7 day rolling regression with slider, where we return the entire lm model. You set .before = 6 to indicate that you want the current value + 6 values before it, and you set .complete = TRUE to indicate that you only want to compute the regression on complete windows (i.e. where you have 7 rows of data).

library(slider)
library(tibble, warn.conflicts = FALSE)

set.seed(123)

df <- tibble(
  date = as.Date("2019-01-01") + 0:49,
  outcome = rnorm(50),
  value = rnorm(50)
)

df$models <- slide(
  df, 
  ~lm(outcome ~ value, data = .x), 
  .before = 6, 
  .complete = TRUE
)

df
#> # A tibble: 50 x 4
#>    date       outcome   value models
#>    <date>       <dbl>   <dbl> <list>
#>  1 2019-01-01 -0.560   0.253  <NULL>
#>  2 2019-01-02 -0.230  -0.0285 <NULL>
#>  3 2019-01-03  1.56   -0.0429 <NULL>
#>  4 2019-01-04  0.0705  1.37   <NULL>
#>  5 2019-01-05  0.129  -0.226  <NULL>
#>  6 2019-01-06  1.72    1.52   <NULL>
#>  7 2019-01-07  0.461  -1.55   <lm>  
#>  8 2019-01-08 -1.27    0.585  <lm>  
#>  9 2019-01-09 -0.687   0.124  <lm>  
#> 10 2019-01-10 -0.446   0.216  <lm>  
#> # … with 40 more rows

df$models[[7]]
#> 
#> Call:
#> lm(formula = outcome ~ value, data = .x)
#> 
#> Coefficients:
#> (Intercept)        value  
#>      0.4156       0.1816

Created on 2020-03-20 by the reprex package (v0.3.0)

aaahhhh .... So close. Thanks very much. Worked a treat. Now startin to compare to same calculation in SQL, should be interesting. :slight_smile:
Thanks, Vincent

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.