How can I apply lm() to specific event, for the pre-event window?

I am conducting an event study with the market model: AR(i,t)=R(i,t) - ((alpha(i) + beta(i)*R(m,t)). I struggle with calculating the alpha(intercept) and beta(slope) estimators because of data format and filtering.

There is an "event" if the Rating Change is non-zero, what I need, is to calculate the alpha and beta for a pre-event-window of -1:-3 (example). However, there are several events per ISIN, meaning I have to group by ISIN and by the event (non-zero rating change) and then do the regression with lm() for the pre-event-window and save the values in columns next to the event and pre-event-window.

Extract of my data looks like this:

    Date_    ISIN       Return  STOXX_Return     Rating_Change     Rating
  2016-10-01 CH00  0.017531563 -0.0003749766              0.00          A
  2016-11-01 CH00 -0.073376071 -0.0220566972              0.00          A
  2016-12-01 CH00 -0.010745412  0.0182991778              0.00          A
  2017-01-01 CH00  0.045742055  0.0641541456              1.90          A
  2017-07-01 CH00 -0.072018814 -0.0193375447              0.00          A
  2018-01-01 GB00  0.041200982  0.0144049186              0.00         B+
  2018-02-01 GB00  0.040654871  0.0119439111              0.00         B+
  2018-03-01 GB00 -0.029012563 -0.0463974419              0.00         B+
  2018-04-01 GB00 -0.073155490 -0.0066808630             -8.90          B
  2018-10-01 GB00  0.042203267  0.0047172371              0.00          B 
  2018-11-01 GB00 -0.073256106 -0.0545350385              0.00          B
....

What I need is another two columns, filled with alpha and beta - like this:

    Date     ISIN            R    STOXX_Return   Rating_Change     Rating alpha beta   
  2016-10-01 CH00  0.017531563 -0.0003749766              0.00          A     1    2
  2016-11-01 CH00 -0.073376071 -0.0220566972              0.00          A     1    2
  2016-12-01 CH00 -0.010745412  0.0182991778              0.00          A     1    2
  2017-01-01 CH00  0.045742055  0.0641541456              1.90          A     1    2
  2018-02-01 GB00  0.040654871  0.0119439111              0.00         B+     3    4
  2018-03-01 GB00 -0.029012563 -0.0463974419              0.00         B+     3    4
  2018-04-01 GB00 -0.073155490 -0.0066808630             -8.90         B+     3    4
... for all Dates and ISINs

1 and 2 (3 and 4) for alpha and beta are just examples to show, for each event I need a specific alpha and beta.

I tried with for-if loops in combination with filter and lm as well as with different attempts of magrittr. However, nothing worked as I wanted probably and obviously due to the fact that I am still a beginner in R. Does anybody understand, what my issue is and how to help/solve it?
Thank you very much in advance for the support.

Hello @antonia ,

I do not fully understand what you are trying to do, but I think you need at least the (three) previous data points before an event. This can be done with the following code (the pivot part is maybe not necessary)

 library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.0.5
  library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:tidyr':
#> 
#>     extract
  
  df = data.frame(
    Date_ = 1:10,
    ISIN = rep(c('CH00','GB00'),length.out=10,each=5),
    Return = c(0.017531563, -0.073376071 , -0.010745412 , 0.045742055, -0.072018814 ,
              0.041200982 , 0.040654871 , -0.029012563 ,  -0.073155490 , 0.042203267  ),
    Rating_Change = c(0,0,0,1.9,0,0,0,0,-8.9,0),
    Rating = c('A','A','A','A','A','B+','B+','B+','B','B')
  )
  df
#>    Date_ ISIN      Return Rating_Change Rating
#> 1      1 CH00  0.01753156           0.0      A
#> 2      2 CH00 -0.07337607           0.0      A
#> 3      3 CH00 -0.01074541           0.0      A
#> 4      4 CH00  0.04574205           1.9      A
#> 5      5 CH00 -0.07201881           0.0      A
#> 6      6 GB00  0.04120098           0.0     B+
#> 7      7 GB00  0.04065487           0.0     B+
#> 8      8 GB00 -0.02901256           0.0     B+
#> 9      9 GB00 -0.07315549          -8.9      B
#> 10    10 GB00  0.04220327           0.0      B
 
  df2 <- df %>%
    group_by(ISIN) %>%
    arrange(Date_) %>%
    mutate (d1 = lag(Return,1),
            d2 = lag(Return,2),
            d3 = lag(Return,3),
            ) %>%
    filter(Rating_Change != 0)
  df2
#> # A tibble: 2 x 8
#> # Groups:   ISIN [2]
#>   Date_ ISIN   Return Rating_Change Rating      d1      d2     d3
#>   <int> <chr>   <dbl>         <dbl> <chr>    <dbl>   <dbl>  <dbl>
#> 1     4 CH00   0.0457           1.9 A      -0.0107 -0.0734 0.0175
#> 2     9 GB00  -0.0732          -8.9 B      -0.0290  0.0407 0.0412
  
  tidyr::pivot_longer(df2,cols=starts_with("d"))
#> # A tibble: 8 x 6
#> # Groups:   ISIN [2]
#>   ISIN   Return Rating_Change Rating name    value
#>   <chr>   <dbl>         <dbl> <chr>  <chr>   <dbl>
#> 1 CH00   0.0457           1.9 A      Date_  4     
#> 2 CH00   0.0457           1.9 A      d1    -0.0107
#> 3 CH00   0.0457           1.9 A      d2    -0.0734
#> 4 CH00   0.0457           1.9 A      d3     0.0175
#> 5 GB00  -0.0732          -8.9 B      Date_  9     
#> 6 GB00  -0.0732          -8.9 B      d1    -0.0290
#> 7 GB00  -0.0732          -8.9 B      d2     0.0407
#> 8 GB00  -0.0732          -8.9 B      d3     0.0412
Created on 2021-05-15 by the reprex package (v2.0.0)

Thank you for your support. It is not exactly what I need, however, it is probably a good start.
I can also use the pivot_longer - however, I need to apply this for both - the "R-Return" (which is for each stock) and the "STOXX_Return" (which is for the market return in each time t).

I applied what you showed as follows(a little bit different arrangement as you proposed):

df2 <- df %>%
  arrange(ISIN) %>%
  mutate (y1 = lag(R,1),
          y2 = lag(R,2),
          y3 = lag(R,3),
          y4 = lag(R,4),) %>%
  mutate (x1 = lag(STOXX_Return,1),
          x2 = lag(STOXX_Return,2),
          x3 = lag(STOXX_Return,3),
          x4 = lag(STOXX_Return,4)
  ) %>%
  filter(Rating_Change != 0)

The output (head) is the following:

      Date         ISIN            R STOXX_Return     Rating_Change     Rating           y1          y2          y3           y4           x1            x2            x3           x4
2017-01-01 CH0012032048  0.045742055  0.064154146              1.90          A -0.010745412 -0.073376071  0.01753156 -0.041243463  0.018299178 -0.0220566972 -0.0003749766  0.013905280
2018-01-01 CH0012032048 -0.006468478  0.014404919             -0.90          A  0.068820950 -0.078049557  0.02507238 -0.002861357 -0.031222670  0.0179348620  0.0374169332 -0.008089635
2016-04-01 GB00B1YW4409  0.015085182 -0.012320754              2.10         B+  0.017573996  -0.090564266 -0.03774363  0.009018911 -0.005780175 -0.0674806982  0.0300996734  0.004980277

Then I wanted to get both, the x values and the y values in a column each (like parallel column for x and one for y). Trying with the following:
df3 <- tidyr::pivot_longer(df2,cols = c(starts_with("x"), starts_with("y")))

Instead of having x (4) and y (4) values parallel (as below), I have now like 8 values for each "event" - all listed one below the other.
Is it possible to do the data arrangement the following way?

  Date       ISIN                R STOXX_Return     Rating_Change    Rating   name(x)   value (x)   name(y)   value (y)
   <date>     <fct>           <dbl>        <dbl>             <dbl> <fct>      <chr>     <dbl>       <chr>     <dbl>
 1 2017-01-01 CH0012032048  0.0457        0.0642               1.9 A          x1     0.0183          y1     -0.010745412  
 2 2017-01-01 CH0012032048  0.0457        0.0642               1.9 A          x2    -0.0221          y2     -0.073376071
 3 2017-01-01 CH0012032048  0.0457        0.0642               1.9 A          x3    -0.000375        y3      0.01753156
 4 2017-01-01 CH0012032048  0.0457        0.0642               1.9 A          x4     0.0139          y4      0.013905280  

Thanks again for your help - I have now at least new approaches to solve my issue.

You could try something like

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.0.5
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

df = data.frame(
    Date_ = 1:10,
    ISIN = rep(c('CH00','GB00'),length.out=10,each=5),
    Return = c(0.017531563, -0.073376071 , -0.010745412 , 0.045742055, -0.072018814 ,
              0.041200982 , 0.040654871 , -0.029012563 ,  -0.073155490 , 0.042203267  ),
    STOXX_Return = c(-0.0003749766, -0.0220566972 , 0.0182991778 , 0.0641541456  , 
                     -0.0193375447 , 0.0144049186 , 0.0119439111 , -0.0463974419 ,  
                     -0.0066808630 , 0.0047172371 ) ,
    Rating_Change = c(0,0,0,1.9,0,0,0,0,-8.9,0),
    Rating = c('A','A','A','A','A','B+','B+','B+','B','B')
  )
df
#>    Date_ ISIN      Return  STOXX_Return Rating_Change Rating
#> 1      1 CH00  0.01753156 -0.0003749766           0.0      A
#> 2      2 CH00 -0.07337607 -0.0220566972           0.0      A
#> 3      3 CH00 -0.01074541  0.0182991778           0.0      A
#> 4      4 CH00  0.04574205  0.0641541456           1.9      A
#> 5      5 CH00 -0.07201881 -0.0193375447           0.0      A
#> 6      6 GB00  0.04120098  0.0144049186           0.0     B+
#> 7      7 GB00  0.04065487  0.0119439111           0.0     B+
#> 8      8 GB00 -0.02901256 -0.0463974419           0.0     B+
#> 9      9 GB00 -0.07315549 -0.0066808630          -8.9      B
#> 10    10 GB00  0.04220327  0.0047172371           0.0      B
df2 <- df %>%
  group_by(ISIN) %>%
  arrange(Date_) %>%
  mutate (y1 = lag(Return,1),
          y2 = lag(Return,2),
          y3 = lag(Return,3),
          y4 = lag(Return,4),) %>%
  mutate (x1 = lag(STOXX_Return,1),
          x2 = lag(STOXX_Return,2),
          x3 = lag(STOXX_Return,3),
          x4 = lag(STOXX_Return,4)
  ) 
df2 
#> # A tibble: 10 x 14
#> # Groups:   ISIN [2]
#>    Date_ ISIN   Return STOXX_Return Rating_Change Rating      y1      y2      y3
#>    <int> <chr>   <dbl>        <dbl>         <dbl> <chr>    <dbl>   <dbl>   <dbl>
#>  1     1 CH00   0.0175    -0.000375           0   A      NA      NA      NA     
#>  2     2 CH00  -0.0734    -0.0221             0   A       0.0175 NA      NA     
#>  3     3 CH00  -0.0107     0.0183             0   A      -0.0734  0.0175 NA     
#>  4     4 CH00   0.0457     0.0642             1.9 A      -0.0107 -0.0734  0.0175
#>  5     5 CH00  -0.0720    -0.0193             0   A       0.0457 -0.0107 -0.0734
#>  6     6 GB00   0.0412     0.0144             0   B+     NA      NA      NA     
#>  7     7 GB00   0.0407     0.0119             0   B+      0.0412 NA      NA     
#>  8     8 GB00  -0.0290    -0.0464             0   B+      0.0407  0.0412 NA     
#>  9     9 GB00  -0.0732    -0.00668           -8.9 B      -0.0290  0.0407  0.0412
#> 10    10 GB00   0.0422     0.00472            0   B      -0.0732 -0.0290  0.0407
#> # ... with 5 more variables: y4 <dbl>, x1 <dbl>, x2 <dbl>, x3 <dbl>, x4 <dbl>
df2 <- df2 %>%
  filter(Rating_Change != 0)
df2
#> # A tibble: 2 x 14
#> # Groups:   ISIN [2]
#>   Date_ ISIN   Return STOXX_Return Rating_Change Rating      y1      y2     y3
#>   <int> <chr>   <dbl>        <dbl>         <dbl> <chr>    <dbl>   <dbl>  <dbl>
#> 1     4 CH00   0.0457      0.0642            1.9 A      -0.0107 -0.0734 0.0175
#> 2     9 GB00  -0.0732     -0.00668          -8.9 B      -0.0290  0.0407 0.0412
#> # ... with 5 more variables: y4 <dbl>, x1 <dbl>, x2 <dbl>, x3 <dbl>, x4 <dbl>
df3 <- tidyr::pivot_longer(df2 %>% select(-c('y1','y2','y3','y4'))
                           ,cols = c('x1','x2','x3','x4'),names_to='var1',values_to='x')
df4 <- tidyr::pivot_longer(df2 %>% select( c('ISIN','y1','y2','y3','y4'))
                           ,cols = c('y1','y2','y3','y4'),names_to='var2',values_to='y')
cbind(df3,df4 %>% ungroup() %>% select(-'ISIN'))
#> # A tibble: 8 x 10
#> # Groups:   ISIN [2]
#>   Date_ ISIN   Return STOXX_Return Rating_Change Rating var1          x var2 
#>   <int> <chr>   <dbl>        <dbl>         <dbl> <chr>  <chr>     <dbl> <chr>
#> 1     4 CH00   0.0457      0.0642            1.9 A      x1     0.0183   y1   
#> 2     4 CH00   0.0457      0.0642            1.9 A      x2    -0.0221   y2   
#> 3     4 CH00   0.0457      0.0642            1.9 A      x3    -0.000375 y3   
#> 4     4 CH00   0.0457      0.0642            1.9 A      x4    NA        y4   
#> 5     9 GB00  -0.0732     -0.00668          -8.9 B      x1    -0.0464   y1   
#> 6     9 GB00  -0.0732     -0.00668          -8.9 B      x2     0.0119   y2   
#> 7     9 GB00  -0.0732     -0.00668          -8.9 B      x3     0.0144   y3   
#> 8     9 GB00  -0.0732     -0.00668          -8.9 B      x4    NA        y4   
#> # ... with 1 more variable: y <dbl>
Created on 2021-05-15 by the reprex package (v2.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.