Shifting row wise for time series analysis

AbhishekHP · February 8, 2019, 4:16pm

Hi Experts,

How can we lag the series in order to feed it to the models ?´

library(dplyr)
# Sample data
eg <- data.frame(dat = c("a","b","c"), Jan = c(1,2,3), Feb = c(4,5,6), Mar = c(7,8,9))
# Repeat each row
eg %>% slice(rep(1:n(), each = 3))
# expected output
exp_out <- data.frame(dat = c(rep("a",3),rep("b",3),rep("c",3)),
                      somecolumnName1 = c(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),
                      somecolumnName2 = c(NaN,NaN,1,NaN,NaN,2,NaN,NaN,3),
                      somecolumnName3 = c(NaN,1,4,NaN,2,5,NaN,3,6),
                      Jan = c(1,4,7,2,5,8,3,6,9),
                      Feb = c(4,7,NA,5,8,NA,6,9,NA),
                      Mar = c(7,NA,NA,8,NA,NA,9,NA,NA))

This is actually from 3 rows and 3 months but how can we do the same shifting operations for 30,000 rows and 36 months ?

Tried with dplyr::lag but it shifts column wise but not row wise

Could you please advise me

andresrcs · February 8, 2019, 5:04pm

Could you elaborate a little more in your desired output? I'm not getting the logic behind it, usually when feeding time series to a model they are on a long format like this

library(dplyr)
library(tidyr)

eg <- data.frame(dat = c("a","b","c"), Jan = c(1,2,3), Feb = c(4,5,6), Mar = c(7,8,9))

eg %>%
    gather(month, value, Jan:Mar) %>%
    mutate(month = factor(month, levels = c('Jan', 'Feb', 'Mar'))) %>% 
    arrange(dat) %>%
    spread(dat, value) %>% 
    mutate_if(is.numeric, funs(lag = lag(.)))
#>   month a b c a_lag b_lag c_lag
#> 1   Jan 1 2 3    NA    NA    NA
#> 2   Feb 4 5 6     1     2     3
#> 3   Mar 7 8 9     4     5     6

^{Created on 2019-02-08 by the reprex package (v0.2.1)}

jlacko · February 9, 2019, 11:40am

Like @andresrcs I fail to grasp the desired output, so I can offer only a generic piece of advice.

One thing to clarify: dplyr::lag() does shift rowwise, not columnwise as you mention. A small example (note the n parameter, which might help you in shifting by 36 months - if I understand correctly is what you desire)

library(dplyr)

frame <- data.frame(data = letters[1:15]) # some sample data

frame <- frame %>% # now shift it
  mutate(default_lag = lag(data), # default n = 1 - previous row :)
         fancy_lag = lag(data, n = 3)) # or what not...

print(frame)
#>    data default_lag fancy_lag
#> 1     a        <NA>      <NA>
#> 2     b           a      <NA>
#> 3     c           b      <NA>
#> 4     d           c         a
#> 5     e           d         b
#> 6     f           e         c
#> 7     g           f         d
#> 8     h           g         e
#> 9     i           h         f
#> 10    j           i         g
#> 11    k           j         h
#> 12    l           k         i
#> 13    m           l         j
#> 14    n           m         k
#> 15    o           n         l

AbhishekHP · February 13, 2019, 3:20pm

Thanks for the solution.

frame <- frame %>% # now shift it
  mutate(default_lag = lag(data), # default n = 1 - previous row)

solves the problem But
I got to shift like this every row different number of times i.e
1st row lag by 1
2nd row lag by 2
........
36th row lag by 36
Also, since the data has 10,000 rows, I need to shift every 36 rows in this cyclic fashion i.e
1st, 37th,...... should be shifted by 1 or lag by 1
2nd, 38th ...... should be shifted by 2 or lag by 2
.......
36th, 72th ...... should be shifted by 36 or lag by 36

is there a better way to do this instead of creating default_lag1,default_lag2,..........default_lag36 and also in a cyclic fashion as described

andresrcs · February 13, 2019, 3:25pm

Yes, you can use mutate_at() or mutate_if()
If you turn this into a reproducible example, with a sample of your actual data, we can give you more specific help.

AbhishekHP · February 13, 2019, 4:39pm

Thanks for suggestion, I updated my request. Hope my description is clear else I will have to come up with a sample to demo.

AbhishekHP · February 14, 2019, 9:37am

Tried to explain the expected output considering 6*6 table and lagging by 2. but the real dataset is much bigger consisting 10,000 rows and 50 columns and needs to be lagged by 18 times cyclically as described below:

# Input dataset
eg <- data.frame(Month = c("a","b","c","d","e","f"), Code = c(11111,22222,33333,44444,66666,23445),
                 col1 = c(1,2,3,4,5,6), col2 = c(3,5,6,7,8,9), col3 = c(2,3,5,7,8,9), col4 = c(1,6,3,5,2,8), col5 = c(2,9,53,67,12,33))
# Multiply each row by 3 times (Number of times to lag minus 1)
rep_eg <- eg %>% slice(rep(1:n(), each = 3))

# Rename the column based on lagging set point : m
colnames(rep_eg) <- c("Month","Code","m-2","m-1","m","m+1","m+2")

# Output as of now 
current_out <- rep_eg %>% mutate("m-1" = lag(m),
                  "m-2" = lag(m,n = 2))
(current_out)

# Expected output : Shift to be applied **cyclically** on the shifted column i.e.
expected_output = current_out
expected_output$`m-2` = c(NA,NA,2,NA,NA,3,NA,NA,5,NA,NA,7,NA,NA,8,NA,NA,9)
expected_output$`m-1` = c(NA,2,2,NA,3,3,NA,5,5,NA,7,7,NA,8,8,NA,9,9)

# Expected output 
(expected output)

Here is the demo of lagging 2 columns (first column by 1 time and second by 2 times) Is there way to lag in this manner for 36 columns (first columns by 1 time , second by 2 times......36th by 36 times)**
How to obtain such expected output described

Thanks in advance

andresrcs · February 15, 2019, 12:49am

I think that the solution would be to use some map_() function, but I'm not that skilled with purrr, so this is a starting point for inviting other users to build upon it or prove me wrong.

library(dplyr)
library(purrr)

eg <- data.frame(Month = c("a","b","c","d","e","f"),
                 Code = c(11111,22222,33333,44444,66666,23445),
                 m = c(1,2,3,4,5,6)
                 )
eg <- eg %>% slice(rep(1:n(), each = 3))
  
map_dfc(2:1, ~(eg %>%
                 group_by(Code) %>% 
                 mutate(!!paste0("m-", .x) := lag(m, n = .x)) %>%
                 ungroup() %>%
                 select(Code, !!paste0("m-", .x))
           )
    ) %>% 
  select(Code, starts_with("m")) %>% 
  left_join(eg %>% distinct(Code, m), by = 'Code')

# A tibble: 18 x 4
    Code `m-2` `m-1`     m
   <dbl> <dbl> <dbl> <dbl>
 1 11111    NA    NA     1
 2 11111    NA     1     1
 3 11111     1     1     1
 4 22222    NA    NA     2
 5 22222    NA     2     2
 6 22222     2     2     2
 7 33333    NA    NA     3
 8 33333    NA     3     3
 9 33333     3     3     3
10 44444    NA    NA     4
11 44444    NA     4     4
12 44444     4     4     4
13 66666    NA    NA     5
14 66666    NA     5     5
15 66666     5     5     5
16 23445    NA    NA     6
17 23445    NA     6     6
18 23445     6     6     6

P.D.: I'm kind of following JD Long's example here
https://twitter.com/CMastication/status/1039847486502916096?s=09

system · February 22, 2019, 12:49am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.