Time-series regression panel data

I have this kind of panel data, from which I should do following regression:

ind11_M <- lm(log(testdata$`- 42 days MV...5`) ~ log(testdata$`Book value of equity(thousands)`) 
                                      + log(testdata$`Net Income (thousands)`) 
                                      + (testdata$`Net Income factor`) * (log(testdata$`Net Income (thousands)`) )
                                         + (testdata$`Market leverage -42 days`))
#> Error in eval(predvars, data, env): object 'testdata' not found

I wasn't able to get the correct warning message here as it is "Warning message: NaNs produced

However, the mian probelm that I have is that how do this regression for each year (data from 1995-2019) for each industry ?

Created on 2020-10-27 by the reprex package (v0.3.0)

tibble::tribble(
  ~Year, ~X..42.days.MV...5, ~Book.value.of.equity.thousands., ~Net.Income..thousands., ~Market.leverage..42.days, ~Acquirer.Industry, ~Net.Income.factor,
  2019,             194670,                            45268,                   12523,                         0,       "Industry11",                  0,
  2019,             515040,                           364816,                   18846,         0.352571975206061,       "Industry12",                  0,
  2019,             816870,                           788000,                   74300,          0.60828534025137,       "Industry11",                  0,
  2019,           95654380,                         14561000,                   4e+06,         0.177169313339019,       "Industry12",                  0,
  2018,            1158580,                          6197000,                  104000,         0.907251124357367,       "Industry11",                  0,
  2018,          179889600,                         13980531,                 5770777,         0.326684463935401,       "Industry12",                  0,
  2018,             616920,                           495170,                   17287,          0.59175867625309,       "Industry11",                  0,
  2018,             124710,                           427600,                  -39800,                         0,       "Industry12",                  1,
  2018,             169620,                            88318,                   13120,         0.191319148124663,       "Industry12",                  0,
  2018,            2634050,                          3402000,                  153000,         0.754493641095903,       "Industry11",                  0
)
#> # A tibble: 10 x 7
#>     Year X..42.days.MV..… Book.value.of.e… Net.Income..tho… Market.leverage…
#>    <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1  2019           194670            45268            12523            0    
#>  2  2019           515040           364816            18846            0.353
#>  3  2019           816870           788000            74300            0.608
#>  4  2019         95654380         14561000          4000000            0.177
#>  5  2018          1158580          6197000           104000            0.907
#>  6  2018        179889600         13980531          5770777            0.327
#>  7  2018           616920           495170            17287            0.592
#>  8  2018           124710           427600           -39800            0    
#>  9  2018           169620            88318            13120            0.191
#> 10  2018          2634050          3402000           153000            0.754
#> # … with 2 more variables: Acquirer.Industry <chr>, Net.Income.factor <dbl>

Created on 2020-10-27 by the reprex package (v0.3.0)

Hi!
Do I understand you correctly in that you want to fit a model for each possible combination of industry and year? In that case, the purrr::map() function is a good way of avoiding to write a for loop. The general workflow is to create a nested data frame with one row for each combination of year and industry with one column that again contains a data frame with all the data for the respective combination (hence the name nested df). Then you use map() to apply your model on the data in each row, which gives you a seperate model for each row.

suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
})

mydf<-tibble::tribble(
  ~Year, ~X..42.days.MV...5, ~Book.value.of.equity.thousands., ~Net.Income..thousands., ~Market.leverage..42.days, ~Acquirer.Industry, ~Net.Income.factor,
  2019,             194670,                            45268,                   12523,                         0,       "Industry11",                  0,
  2019,             515040,                           364816,                   18846,         0.352571975206061,       "Industry12",                  0,
  2019,             816870,                           788000,                   74300,          0.60828534025137,       "Industry11",                  0,
  2019,           95654380,                         14561000,                   4e+06,         0.177169313339019,       "Industry12",                  0,
  2018,            1158580,                          6197000,                  104000,         0.907251124357367,       "Industry11",                  0,
  2018,          179889600,                         13980531,                 5770777,         0.326684463935401,       "Industry12",                  0,
  2018,             616920,                           495170,                   17287,          0.59175867625309,       "Industry11",                  0,
  2018,             124710,                           427600,                  -39800,                         0,       "Industry12",                  1,
  2018,             169620,                            88318,                   13120,         0.191319148124663,       "Industry12",                  0,
  2018,            2634050,                          3402000,                  153000,         0.754493641095903,       "Industry11",                  0
)

# create a nested df with one row per industry/year combination
mydf_nest<-mydf%>%
  nest(data=-c(Year,Acquirer.Industry))

mydf_nest
#> # A tibble: 4 x 3
#>    Year Acquirer.Industry data            
#>   <dbl> <chr>             <list>          
#> 1  2019 Industry11        <tibble [2 x 5]>
#> 2  2019 Industry12        <tibble [2 x 5]>
#> 3  2018 Industry11        <tibble [3 x 5]>
#> 4  2018 Industry12        <tibble [3 x 5]>

# create a function for the linear model                      
model_fct<-function(df){
  lm(log(X..42.days.MV...5)~log(Book.value.of.equity.thousands.+
                                 log(Net.Income..thousands.) + 
                                 Net.Income.factor * (log(Net.Income..thousands.) +
                                 Market.leverage..42.days)),data = df)
}

# apply the linnear model on each row 
mydf_modeled<-mydf_nest%>%
  mutate(models=data%>%map(model_fct))
#> Warning: Problem with `mutate()` input `models`.
#> i NaNs produced
#> i Input `models` is `data %>% map(model_fct)`.
#> Warning in log(Net.Income..thousands.): NaNs produced
#> Warning: Problem with `mutate()` input `models`.
#> i NaNs produced
#> i Input `models` is `data %>% map(model_fct)`.
#> Warning in log(Net.Income..thousands.): NaNs produced

# tidy up and unnest your results - Two rows per model
mydf_modeled%>%
  mutate(models_tidy=map(models,tidy))%>%
  unnest(models_tidy)
#> # A tibble: 8 x 9
#>    Year Acquirer.Indust~ data  models term  estimate std.error statistic p.value
#>   <dbl> <chr>            <lis> <list> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
#> 1  2019 Industry11       <tib~ <lm>   (Int~    6.80    NaN       NaN     NaN    
#> 2  2019 Industry11       <tib~ <lm>   log(~    0.502   NaN       NaN     NaN    
#> 3  2019 Industry12       <tib~ <lm>   (Int~   -5.00    NaN       NaN     NaN    
#> 4  2019 Industry12       <tib~ <lm>   log(~    1.42    NaN       NaN     NaN    
#> 5  2018 Industry11       <tib~ <lm>   (Int~    8.59      5.95      1.44    0.386
#> 6  2018 Industry11       <tib~ <lm>   log(~    0.373     0.406     0.917   0.527
#> 7  2018 Industry12       <tib~ <lm>   (Int~   -3.63    NaN       NaN     NaN    
#> 8  2018 Industry12       <tib~ <lm>   log(~    1.38    NaN       NaN     NaN

Created on 2020-10-28 by the reprex package (v0.3.0)

If you want to know more about this workflow, you can have a look at the chapter about many models in R four datascience (Chapter 25), it describes the workflow concisely and gives you more examples in how it can be used.

Regarding the NaNs you are getting, you have a negative value in the Net.Income..thousands. column in your sample data which will give you a NaN when you try to take the logarithm.

1 Like

Hi,

Thank you very much for the answer, I see this as a very helpful!

As I tried to do this, I encountered to error, when I tried to apply the linnear model on each row: Even though I could get it to this table is "Error 0 (non-NA) cases"

In the above example you created, is the "df" in the "create function for the linear model" part same as that "mydf" which refers to the whole dataframe?

mydf_modeled<-mydf_nest%>%
         mutate(models=data%>%map(model_fct))
#> Error in mydf_nest %>% mutate(models = data %>% map(model_fct)): could not find function "%>%"

Created on 2020-10-28 by the reprex package (v0.3.0)

When you create a custom function in R, you specify the input that the function takes after the function call. So the df argument is just something like a placeholder for your actual data.

However, the error message you are getting says could not find function "%>%". Did you load the packages that are mentioned at the beginning of the code, and did you maybe get a similar error message when you create the nested data frame?

Thank you for time and help, I think I will get this to work with these instructions. I'll came back as I get the results, as I would have then as a next step include the industry average Ln(M) = average intercept to this next equation 11 for each firm in the data, based on which industry it is in.

Now I have gathered all the industry average LN(M) coefficients to this table:

tibble::tribble(
           ~V1,        ~V2,
   "Industry1",  1.3104855,
  "Industry10",  4.3766684,
  "Industry11",  2.1301167,
  "Industry12",  0.5803704,
   "Industry2", 15.0790926,
   "Industry3",  4.5219326,
   "Industry4",  4.8040403,
   "Industry5", 16.9073884,
   "Industry6",  6.8916313,
   "Industry7", 10.9153588,
   "Industry8", 14.9359291,
   "Industry9",   7.111956
  )
#> # A tibble: 12 x 2
#>    V1             V2
#>    <chr>       <dbl>
#>  1 Industry1   1.31 
#>  2 Industry10  4.38 
#>  3 Industry11  2.13 
#>  4 Industry12  0.580
#>  5 Industry2  15.1  
#>  6 Industry3   4.52 
#>  7 Industry4   4.80 
#>  8 Industry5  16.9  
#>  9 Industry6   6.89 
#> 10 Industry7  10.9  
#> 11 Industry8  14.9  
#> 12 Industry9   7.11

Created on 2020-10-28 by the reprex package (v0.3.0)

Now I have to do the regression (eq 11) for each firm, and apply the correct industry specific intercept from the table above to that formula.Is there any way in R, that it would automatically refer to Industry 1 coefficient, if the firm is in industry 1. The table which I use is similar to the table in my first post above.

Additionally, I would like to ask if someone knows how I can create following table(Model III) from the regression that I have already completed?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.