Multiple rolling mean windows group_by

purrr
tidyverse

#1

Using data(FANG), say I know that there is a smoothed relationship between volume and opening price. Also I know the length of the most predictive rolling mean varies by stock. For some it is short, day 2 days. For others 10. I’d like to create multiple rolling means of lengths between 2 and 10 days for each stock.

So far I tried the tibbletime package and got a start so that I can calculate the multiple rolling means for one.

FB <- FANG %>% filter(symbol == “FB”)

meanstep <- seq(2, 10, 1)

col_names <- map_chr(meanstep, ~paste0("rollmean_", .x))

rollers <- map(meanstep, ~rollify(mean, window = .x)) %>% set_names(nm = col_names)

FB_multiroll<- bind_cols(FB, invoke_map(rollers, x = FB$volume))

However, I can’t seem to figure out how to make this work when grouping by multiple stocks.

I tried adding:

FANG_with_multiroll<- FANG %>%
    group_by(symbol) %>%
    bind_cols(FANG, invoke_map(rollers, x =FANG$volume)

But that didn’t work. Any ideas would be appreciated. One I get it to work, I plan on finding the highest correlation or rsquared for each symbol. If you have ideas about better ways to do that too, I’m interested.


#2

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that we can fix it: please help us help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.


#3

I see you've found my Gist.

Here is one way that uses rlang's !!! to splice the rollers into mutate_at(). A neat solution. Making sure that the rollers are named is important here!

library(tibbletime)
library(dplyr)
library(purrr)
data(FANG)

# Create the column names
col_names <- map_chr(2:10, ~paste0("adjusted_", .x))

# Creating the rolling functions and assign them names
rollers <- map(2:10, ~rollify(mean, window = .x)) %>%
  set_names(nm = col_names)

# We can create our named function list with funs() and splicing
funs(!!!rollers)
#> <fun_calls>
#> $ adjusted_2 : (function (...) ...
#> $ adjusted_3 : (function (...) ...
#> $ adjusted_4 : (function (...) ...
#> $ adjusted_5 : (function (...) ...
#> $ adjusted_6 : (function (...) ...
#> $ adjusted_7 : (function (...) ...
#> $ adjusted_8 : (function (...) ...
#> $ adjusted_9 : (function (...) ...
#> $ adjusted_10: (function (...) ...

# And then use it in mutate_at()

FANG %>%
  
  # Group by symbol
  group_by(symbol) %>%
  
  # Splice in the rollers, and call them on the adjusted column
  mutate_at("adjusted", funs(!!!rollers))
#> Warning: package 'bindrcpp' was built under R version 3.4.4
#> # A tibble: 4,032 x 17
#> # Groups:   symbol [4]
#>    symbol date        open  high   low close    volume adjusted adjusted_2
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>      <dbl>
#>  1 FB     2013-01-02  27.4  28.2  27.4  28    69846400     28         NA  
#>  2 FB     2013-01-03  27.9  28.5  27.6  27.8  63140600     27.8       27.9
#>  3 FB     2013-01-04  28.0  28.9  27.8  28.8  72715400     28.8       28.3
#>  4 FB     2013-01-07  28.7  29.8  28.6  29.4  83781800     29.4       29.1
#>  5 FB     2013-01-08  29.5  29.6  28.9  29.1  45871300     29.1       29.2
#>  6 FB     2013-01-09  29.7  30.6  29.5  30.6 104787700     30.6       29.8
#>  7 FB     2013-01-10  30.6  31.5  30.3  31.3  95316400     31.3       30.9
#>  8 FB     2013-01-11  31.3  32.0  31.1  31.7  89598000     31.7       31.5
#>  9 FB     2013-01-14  32.1  32.2  30.6  31.0  98892800     31.0       31.3
#> 10 FB     2013-01-15  30.6  31.7  29.9  30.1 173242600     30.1       30.5
#> # ... with 4,022 more rows, and 8 more variables: adjusted_3 <dbl>,
#> #   adjusted_4 <dbl>, adjusted_5 <dbl>, adjusted_6 <dbl>,
#> #   adjusted_7 <dbl>, adjusted_8 <dbl>, adjusted_9 <dbl>,
#> #   adjusted_10 <dbl>

Created on 2018-09-05 by the reprex package (v0.2.0).


#4

That is great... I used your gist a few months back to do this calculate multiple means for a single set of obs. Am now trying to optimize rolling predictive capacity of a precipitation deviation rolling mean to predict groundwater levels at monitoring sites.

If you have any good tutorials on using the !!! operator, it would be appreciated.