Using Dplyr to Generate Multiple Lags

john.smith · March 1, 2021, 9:01am

Hi,

I am trying to generate multiple lags in my dataset to see if certain days correlate with each other. I am trying to use dplyr and setNames for this based on a post by Dr Simon Jackson on Github

My code is below and i am getting the following error based on looking for 6 lags

Error in setNames(paste("apply_lags(., ", lags, ")")) :
argument "nm" is missing, with no default

Does anyone know where i am going wrong?


# Here we create many lag functions using SetNames
apply_lags <- function(k, mydf) {
  
  label = glue::glue("lag_{k}_day")
  
  mydf %>% 
    mutate("lag_{{k}}_day" := lag(shps, n = k))
  
}

# Create a dataframe
x = seq(2,20,2) %>% 
  enframe()

# We set the number of lags we are interested in to 3
lags <- seq(1:3)
# ERROR HERE
lag_functions <- setNames(paste("apply_lags(., ", lags, ")"))

# Apply this to all the functions we have created  
x %>% 
  mutate_at(vars(lags), funs_(lag_functions))

nirgrahamuk · March 1, 2021, 9:29am

Why not use exactly Dr Jackson working code?
Your version introduces a mystery apply_lags function that we have no knowledge of....is there a reason you avoid dplyr::lag ?

john.smith · March 1, 2021, 9:31am

Oh sorry,

I thought i had put in the function

Please find the updated function in the original question

I'm actually just trying to reason out the code from Dr Jackson so that i understand it better in my own terms

john.smith · March 1, 2021, 9:50am

Hi All,

I have managed to get it to work
The error basically needed me to name the results of each function call which was solved with nm = label in the call to lag_function


# Here we create many lag functions using SetNames
apply_lags <- function(mydf, k) {
  lag(mydf, n = k)
}

# Create a dataframe
x = seq(2,20,2) %>% 
  enframe()

# We set the number of lags we are interested in to 4
lags <- seq(1:4)
label = glue::glue("lag_{lags}_day") %>% 
  as.character()

lag_functions <- setNames(paste("apply_lags(., ", lags, ")"), nm = label)

# Apply this to all the functions we have created  
x %>% 
  mutate_at(vars(value), funs_(lag_functions))

nirgrahamuk · March 1, 2021, 10:19am

[redacted : content above as john has already realised the things I wrote about before I hit post, so just leaving the rest]

Here I have made the code somewhat more generic, but I suppose further abstracting it has added an additional layer of complexity to understand, although it is perhaps more explicit ?

library(tidyverse)
library(glue)
d <- data_frame(x = seq_len(100))
d

gen_fun_maker <- function(func_as_string,
                          to_do,
                          gluestr)
{
to_do_vec <- seq_len(to_do)     
gen_names <- glue('glue("{gluestr}")') %>% 
             str2expression %>% eval

setNames(glue("{func_as_string}(., {to_do_vec})"),
         gen_names)
}

#using the above 

myfuns <- gen_fun_maker("dplyr::lag",5,"lag_{to_do_vec}_day")
d %>% mutate_at(vars(x), funs_(myfuns))

myfuns_2 <- gen_fun_maker("dplyr::lead",5,"lead_{to_do_vec}_day")
d %>% mutate_at(vars(x), funs_(myfuns_2))

TimTeaFan · March 1, 2021, 10:38am

I think the new mutate lets us use map_dfc for this kind of task. Either directly to lag one variable or inside across to lag many variables.

library(tidyverse)

# some data 
x = seq(2,20,2) %>%
  enframe() %>% 
  mutate(beta = value * 2)

lags <- seq(1:3)

# approach with one variable
myvars <- lags %>% set_names(., paste0("value_lag", .))

x %>%
  mutate(map_dfc(myvars,
                 ~ lag(value, .x)))
#> # A tibble: 10 x 6
#>     name value  beta value_lag1 value_lag2 value_lag3
#>    <int> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
#>  1     1     2     4         NA         NA         NA
#>  2     2     4     8          2         NA         NA
#>  3     3     6    12          4          2         NA
#>  4     4     8    16          6          4          2
#>  5     5    10    20          8          6          4
#>  6     6    12    24         10          8          6
#>  7     7    14    28         12         10          8
#>  8     8    16    32         14         12         10
#>  9     9    18    36         16         14         12
#> 10    10    20    40         18         16         14


# approach with more than one variable

myvars <- lags %>%
  set_names(., paste0("lag", .))  

x %>% 
  mutate(across(c(value, beta),
              ~ map_dfc(myvars,
                        function(y) lag(.x, y))
)) %>% 
  do.call(data.frame, .)
#>    name value.lag1 value.lag2 value.lag3 beta.lag1 beta.lag2 beta.lag3
#> 1     1         NA         NA         NA        NA        NA        NA
#> 2     2          2         NA         NA         4        NA        NA
#> 3     3          4          2         NA         8         4        NA
#> 4     4          6          4          2        12         8         4
#> 5     5          8          6          4        16        12         8
#> 6     6         10          8          6        20        16        12
#> 7     7         12         10          8        24        20        16
#> 8     8         14         12         10        28        24        20
#> 9     9         16         14         12        32        28        24
#> 10   10         18         16         14        36        32        28

^{Created on 2021-03-01 by the reprex package (v0.3.0)}

system · March 8, 2021, 10:38am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.