creating n lags for all variables in a tibble

Consider this simple tibble

> tibble(myvar = c(1,2,3,4,5,6),
+        myvar2 = c('a', 'b','c', 'd', 'e', 'f'))
# A tibble: 6 x 2
  myvar myvar2
  <dbl> <chr> 
1     1 a     
2     2 b     
3     3 c     
4     4 d     
5     5 e     
6     6 f

Now lets say you want to create lagged versions of all the variables in the tibble. If the number of lagged versions is small, then this is simply:

   > tibble(myvar = c(1,2,3,4,5,6),
+        myvar2 = c('a', 'b','c', 'd', 'e', 'f')) %>%
+   mutate_all(., funs(lag1 = dplyr::lag(.,1),lag2 = dplyr::lag(.,2)))
# A tibble: 6 x 6
  myvar myvar2 myvar_lag1 myvar2_lag1 myvar_lag2 myvar2_lag2
  <dbl> <chr>       <dbl> <chr>            <dbl> <chr>      
1     1 a              NA NA                  NA NA         
2     2 b               1 a                   NA NA         
3     3 c               2 b                    1 a          
4     4 d               3 c                    2 b          
5     5 e               4 d                    3 c          
6     6 f               5 e                    4 d

My issue is that this code is inefficient if I need to create more than 2 lagged variables, because I need to copy and paste lagn = dplyr::lag(.,n) every time.

Is there a more efficient way to do so? Creating n lagged versions at once?

One way is to create functions that create lags explicitly like this:

#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union

lags <- purrr::map(1:4, ~rlang::quo(dplyr::lag(., .x))) %>%
  purrr::set_names(paste0("lag", 1:4))

tibble::as_tibble(mtcars) %>%
#> # A tibble: 32 x 55
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows, and 44 more variables: mpg_lag1 <dbl>,
#> #   cyl_lag1 <dbl>, disp_lag1 <dbl>, hp_lag1 <dbl>, drat_lag1 <dbl>,
#> #   wt_lag1 <dbl>, qsec_lag1 <dbl>, vs_lag1 <dbl>, am_lag1 <dbl>,
#> #   gear_lag1 <dbl>, carb_lag1 <dbl>, mpg_lag2 <dbl>, cyl_lag2 <dbl>,
#> #   disp_lag2 <dbl>, hp_lag2 <dbl>, drat_lag2 <dbl>, wt_lag2 <dbl>,
#> #   qsec_lag2 <dbl>, vs_lag2 <dbl>, am_lag2 <dbl>, gear_lag2 <dbl>,
#> #   carb_lag2 <dbl>, mpg_lag3 <dbl>, cyl_lag3 <dbl>, disp_lag3 <dbl>,
#> #   hp_lag3 <dbl>, drat_lag3 <dbl>, wt_lag3 <dbl>, qsec_lag3 <dbl>,
#> #   vs_lag3 <dbl>, am_lag3 <dbl>, gear_lag3 <dbl>, carb_lag3 <dbl>,
#> #   mpg_lag4 <dbl>, cyl_lag4 <dbl>, disp_lag4 <dbl>, hp_lag4 <dbl>,
#> #   drat_lag4 <dbl>, wt_lag4 <dbl>, qsec_lag4 <dbl>, vs_lag4 <dbl>,
#> #   am_lag4 <dbl>, gear_lag4 <dbl>, carb_lag4 <dbl>

Created on 2019-05-08 by the reprex package (v0.2.1)

There is probably a more streamlined way to do it, but it escapes me at the moment.


Have you looked at the package tsibble?

how would you do that with tsibble?

Yes, tsibble is one of my favourite packages, but like @von_olaf, I'm not sure how it's applicable here.

There is no time-series here, so not sure how I can use it. I'd be happy to see what you have in mind, though!

Nice post for this by @romain:

It has been my go-to :slight_smile:

