creating a lagged column for every column in a zoo object

I have a zoo object with 4 columns. I want to add a lagged column to every column of the zoo object by adding the name of the column and the"lag_" in front of that. For example if one column, is named "CONS", the lagged column of zoo object should be "lag_CONS". i want this to be done in a loop.

Any suggestions?

This would be easier to answer with a reproducible example. There's good documentation on creating a reprex on the tidyverse page.

packages<-c("tidyverse","zoo","tseries","urca")
sapply(packages,library,character.only=T)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> $tidyverse
#>  [1] "forcats"   "stringr"   "dplyr"     "purrr"     "readr"     "tidyr"    
#>  [7] "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics"  "grDevices"
#> [13] "utils"     "datasets"  "methods"   "base"     
#> 
#> $zoo
#>  [1] "zoo"       "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
#>  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
#> [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
#> 
#> $tseries
#>  [1] "tseries"   "zoo"       "forcats"   "stringr"   "dplyr"     "purrr"    
#>  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
#> [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
#> 
#> $urca
#>  [1] "urca"      "tseries"   "zoo"       "forcats"   "stringr"   "dplyr"    
#>  [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
#> [13] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
#> [19] "base"

data<-tibble::tribble(
                 ~Year,    ~CONS,     ~INC,     ~INV,
              "1960Q1", 6.028278, 6.111467, 5.192957,
              "1960Q2", 6.042633, 6.142037, 5.187386,
              "1960Q3", 6.073044, 6.184149, 5.220356,
              "1960Q4", 6.104793, 6.200509, 5.257495,
              "1961Q1",  6.12905, 6.232448, 5.351858,
              "1961Q2", 6.126869, 6.253829, 5.308268,
              "1961Q3",   6.1717,  6.25575, 5.332719,
              "1961Q4", 6.188264, 6.291569, 5.365976,
              "1962Q1",  6.20859, 6.306275, 5.442418
              )


data_zoo<-data %>% zoo(data,order.by = data$Year)
#> Warning in zoo(., data, order.by = data$Year): "order.by" and "frequency" do not
#> match: "frequency" ignored

This is the reprex thing. I want to add a lag and difference for each of the zoo column other than the year, name it with prefix "lag" for lagged columns and "diff" for difference columns.

using names(data_zoo) , i expect something like this,

[1] "Year" "CONS" "INC" "INV" "lag_CONS" "lag_INC"
[7] "lag_INV" "diff_ln_CONS" "diff_ln_INC" "diff_ln_INV"

First, there is a small problem in your example:

data %>%
  zoo(data,order.by = data$Year)

you are providing data twice, first through the pipe (%>%) and then explicitly as first argument. So that is equivalent to:

zoo(data, data, order.by = data$Year)

which R understands as:

zoo(x = data, order.by = data$Year, frequency = data)

hence a warning.

As to your question, you can see ?lag.zoo and ?diff.zoo for documentation. Getting the lag is easy:

lag(data_zoo$CONS)

but getting the difference is a bit more challenging, as a zoo object is a matrix, so everything in data_zoo has to be the same type. Since Year is a character, the whole matrix gets a type of character, and of course you can't compute differences on characters. So you need to explicitly convert back to numeric:

diff(as.numeric(data_zoo$CONS))

Now the question is how to do that for every column, which can be done simply with a loop:

my_cols <- c("CONS", "INC", "INV")
for(cur_col in my_cols){
  lag(data_zoo[,cur_col])
  diff(as.numeric(data_zoo[,cur_col])))
}

But more importantly how to merge that back in the existing zoo object (although, are you sure this is what you want to do?). The way to do that is illustrated in vignette("zoo"). There is one difficulty, the diff() is not a zoo object anymore, so we have to make it back into such an object, while giving the index again.

data_zoo2 <- data_zoo
data_zoo2 <- cbind(data_zoo2,
                     "lag_CONS" = lag(data_zoo[,"CONS"]),
                     "diff_CONS" = zoo(diff(as.numeric(data_zoo[,"CONS"])), data$Year[-1]))

We still have one problem with creating column names on-the-fly. I find it easier to create them separately and then use them on the final zoo object (requires some rep() magic). So that gives a final script:

data_zoo <- data %>%
  zoo(order.by = data$Year)

my_cols <- c("CONS", "INC", "INV")
data_zoo2 <- data_zoo
for(cur_col in my_cols){
  data_zoo2 <- cbind(data_zoo2,
                     lag(data_zoo[,cur_col]),
                     zoo(diff(as.numeric(data_zoo[,cur_col])), data$Year[-1]))
}
new_names <- c(names(data_zoo),
               paste0(rep(c("lag_", "diff_"), times=length(my_cols)),
                      rep(my_cols, each=2)))
names(data_zoo2) <- new_names

data_zoo2
#        Year   CONS     INC      INV      lag_CONS diff_CONS            lag_INC  diff_INC            lag_INV  diff_INV            
# 1960Q1 1960Q1 6.028278 6.111467 5.192957 6.028278 <NA>                 6.111467 <NA>                5.192957 <NA>                
# 1960Q2 1960Q2 6.042633 6.142037 5.187386 6.042633 0.0143550000000001   6.142037 0.03057             5.187386 -0.00557099999999977
# 1960Q3 1960Q3 6.073044 6.184149 5.220356 6.073044 0.030411             6.184149 0.0421119999999995  5.220356 0.0329699999999997  
# 1960Q4 1960Q4 6.104793 6.200509 5.257495 6.104793 0.0317489999999996   6.200509 0.0163600000000006  5.257495 0.0371389999999998  
# 1961Q1 1961Q1 6.129050 6.232448 5.351858 6.129050 0.0242570000000004   6.232448 0.0319389999999995  5.351858 0.0943630000000004  
# 1961Q2 1961Q2 6.126869 6.253829 5.308268 6.126869 -0.00218100000000021 6.253829 0.0213809999999999  5.308268 -0.04359            
# 1961Q3 1961Q3 6.171700 6.255750 5.332719 6.171700 0.0448310000000003   6.255750 0.00192100000000028 5.332719 0.024451            
# 1961Q4 1961Q4 6.188264 6.291569 5.365976 6.188264 0.0165639999999998   6.291569 0.035819            5.365976 0.0332569999999999  
# 1962Q1 1962Q1 6.208590 6.306275 5.442418 <NA>     0.0203259999999998   <NA>     0.0147060000000003  <NA>     0.0764420000000001  
# <NA>   <NA>   <NA>     <NA>     <NA>     <NA>     <NA>                 <NA>     <NA>                <NA>     <NA>                

(technical note: this for loop with a cbind() at each step is only possible because we have very few columns; it would not be appropriate with many columns)

Great. Thanks a lot. Helpful for my case

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.