creating n lags for all variables in a tibble

Consider this simple tibble

> tibble(myvar = c(1,2,3,4,5,6),
+        myvar2 = c('a', 'b','c', 'd', 'e', 'f'))
# A tibble: 6 x 2
  myvar myvar2
  <dbl> <chr> 
1     1 a     
2     2 b     
3     3 c     
4     4 d     
5     5 e     
6     6 f

Now lets say you want to create lagged versions of all the variables in the tibble. If the number of lagged versions is small, then this is simply:

   > tibble(myvar = c(1,2,3,4,5,6),
+        myvar2 = c('a', 'b','c', 'd', 'e', 'f')) %>%
+   mutate_all(., funs(lag1 = dplyr::lag(.,1),lag2 = dplyr::lag(.,2)))
# A tibble: 6 x 6
  myvar myvar2 myvar_lag1 myvar2_lag1 myvar_lag2 myvar2_lag2
  <dbl> <chr>       <dbl> <chr>            <dbl> <chr>      
1     1 a              NA NA                  NA NA         
2     2 b               1 a                   NA NA         
3     3 c               2 b                    1 a          
4     4 d               3 c                    2 b          
5     5 e               4 d                    3 c          
6     6 f               5 e                    4 d

My issue is that this code is inefficient if I need to create more than 2 lagged variables, because I need to copy and paste lagn = dplyr::lag(.,n) every time.

Is there a more efficient way to do so? Creating n lagged versions at once?
Thanks!

One way is to create functions that create lags explicitly like this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

lags <- purrr::map(1:4, ~rlang::quo(dplyr::lag(., .x))) %>%
  purrr::set_names(paste0("lag", 1:4))

tibble::as_tibble(mtcars) %>%
  dplyr::mutate_all(lags)
#> # A tibble: 32 x 55
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows, and 44 more variables: mpg_lag1 <dbl>,
#> #   cyl_lag1 <dbl>, disp_lag1 <dbl>, hp_lag1 <dbl>, drat_lag1 <dbl>,
#> #   wt_lag1 <dbl>, qsec_lag1 <dbl>, vs_lag1 <dbl>, am_lag1 <dbl>,
#> #   gear_lag1 <dbl>, carb_lag1 <dbl>, mpg_lag2 <dbl>, cyl_lag2 <dbl>,
#> #   disp_lag2 <dbl>, hp_lag2 <dbl>, drat_lag2 <dbl>, wt_lag2 <dbl>,
#> #   qsec_lag2 <dbl>, vs_lag2 <dbl>, am_lag2 <dbl>, gear_lag2 <dbl>,
#> #   carb_lag2 <dbl>, mpg_lag3 <dbl>, cyl_lag3 <dbl>, disp_lag3 <dbl>,
#> #   hp_lag3 <dbl>, drat_lag3 <dbl>, wt_lag3 <dbl>, qsec_lag3 <dbl>,
#> #   vs_lag3 <dbl>, am_lag3 <dbl>, gear_lag3 <dbl>, carb_lag3 <dbl>,
#> #   mpg_lag4 <dbl>, cyl_lag4 <dbl>, disp_lag4 <dbl>, hp_lag4 <dbl>,
#> #   drat_lag4 <dbl>, wt_lag4 <dbl>, qsec_lag4 <dbl>, vs_lag4 <dbl>,
#> #   am_lag4 <dbl>, gear_lag4 <dbl>, carb_lag4 <dbl>

Created on 2019-05-08 by the reprex package (v0.2.1)

There is probably a more streamlined way to do it, but it escapes me at the moment.

6 Likes

Have you looked at the package tsibble?

1 Like

how would you do that with tsibble?

Yes, tsibble is one of my favourite packages, but like @von_olaf, I'm not sure how it's applicable here.

There is no time-series here, so not sure how I can use it. I'd be happy to see what you have in mind, though!

1 Like

Nice post for this by @romain:

It has been my go-to :slight_smile:

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.