lag value in panel data

how to create lag value of variable in panel data.

Thank you in advance.

Is dplyr::lag() what you're looking for?

(dat <- dplyr::tibble(month = month.abb))
#> # A tibble: 12 × 1
#>    month
#>    <chr>
#>  1 Jan  
#>  2 Feb  
#>  3 Mar  
#>  4 Apr  
#>  5 May  
#>  6 Jun  
#>  7 Jul  
#>  8 Aug  
#>  9 Sep  
#> 10 Oct  
#> 11 Nov  
#> 12 Dec

dplyr::transmute(
  dat,
  last_month = dplyr::lag(month),
  month = month,
  next_month = dplyr::lead(month)
)
#> # A tibble: 12 × 3
#>    last_month month next_month
#>    <chr>      <chr> <chr>     
#>  1 <NA>       Jan   Feb       
#>  2 Jan        Feb   Mar       
#>  3 Feb        Mar   Apr       
#>  4 Mar        Apr   May       
#>  5 Apr        May   Jun       
#>  6 May        Jun   Jul       
#>  7 Jun        Jul   Aug       
#>  8 Jul        Aug   Sep       
#>  9 Aug        Sep   Oct       
#> 10 Sep        Oct   Nov       
#> 11 Oct        Nov   Dec       
#> 12 Nov        Dec   <NA>

Created on 2023-02-25 with reprex v2.0.2

1 Like

Unfortunately, this doesn't work in general for a panel. lag() just takes the previous row without knowing anything about the panel structure of the data. Notice that in row 13 of the output Last_CrossSection is "A". It should be NA.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
dat <- dplyr::tibble(month = c(month.abb,month.abb),
                      crossSection =
                        c(rep("A",12),rep("B",12)))
dat <- dplyr::mutate(
  dat,
  last_month = dplyr::lag(month),
  last_crossSection = dplyr::lag(crossSection))
print(dat,n = 24)
#> # A tibble: 24 × 4
#>    month crossSection last_month last_crossSection
#>    <chr> <chr>        <chr>      <chr>            
#>  1 Jan   A            <NA>       <NA>             
#>  2 Feb   A            Jan        A                
#>  3 Mar   A            Feb        A                
#>  4 Apr   A            Mar        A                
#>  5 May   A            Apr        A                
#>  6 Jun   A            May        A                
#>  7 Jul   A            Jun        A                
#>  8 Aug   A            Jul        A                
#>  9 Sep   A            Aug        A                
#> 10 Oct   A            Sep        A                
#> 11 Nov   A            Oct        A                
#> 12 Dec   A            Nov        A                
#> 13 Jan   B            Dec        A                
#> 14 Feb   B            Jan        B                
#> 15 Mar   B            Feb        B                
#> 16 Apr   B            Mar        B                
#> 17 May   B            Apr        B                
#> 18 Jun   B            May        B                
#> 19 Jul   B            Jun        B                
#> 20 Aug   B            Jul        B                
#> 21 Sep   B            Aug        B                
#> 22 Oct   B            Sep        B                
#> 23 Nov   B            Oct        B                
#> 24 Dec   B            Nov        B

This could be addressed with group_by() which will alter the lead lag behaviour to be in group

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.