Hi!
Do I understand you correctly in that you want to fit a model for each possible combination of industry and year? In that case, the purrr::map() function is a good way of avoiding to write a for loop. The general workflow is to create a nested data frame with one row for each combination of year and industry with one column that again contains a data frame with all the data for the respective combination (hence the name nested df). Then you use map() to apply your model on the data in each row, which gives you a seperate model for each row.
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
})
mydf<-tibble::tribble(
~Year, ~X..42.days.MV...5, ~Book.value.of.equity.thousands., ~Net.Income..thousands., ~Market.leverage..42.days, ~Acquirer.Industry, ~Net.Income.factor,
2019, 194670, 45268, 12523, 0, "Industry11", 0,
2019, 515040, 364816, 18846, 0.352571975206061, "Industry12", 0,
2019, 816870, 788000, 74300, 0.60828534025137, "Industry11", 0,
2019, 95654380, 14561000, 4e+06, 0.177169313339019, "Industry12", 0,
2018, 1158580, 6197000, 104000, 0.907251124357367, "Industry11", 0,
2018, 179889600, 13980531, 5770777, 0.326684463935401, "Industry12", 0,
2018, 616920, 495170, 17287, 0.59175867625309, "Industry11", 0,
2018, 124710, 427600, -39800, 0, "Industry12", 1,
2018, 169620, 88318, 13120, 0.191319148124663, "Industry12", 0,
2018, 2634050, 3402000, 153000, 0.754493641095903, "Industry11", 0
)
# create a nested df with one row per industry/year combination
mydf_nest<-mydf%>%
nest(data=-c(Year,Acquirer.Industry))
mydf_nest
#> # A tibble: 4 x 3
#> Year Acquirer.Industry data
#> <dbl> <chr> <list>
#> 1 2019 Industry11 <tibble [2 x 5]>
#> 2 2019 Industry12 <tibble [2 x 5]>
#> 3 2018 Industry11 <tibble [3 x 5]>
#> 4 2018 Industry12 <tibble [3 x 5]>
# create a function for the linear model
model_fct<-function(df){
lm(log(X..42.days.MV...5)~log(Book.value.of.equity.thousands.+
log(Net.Income..thousands.) +
Net.Income.factor * (log(Net.Income..thousands.) +
Market.leverage..42.days)),data = df)
}
# apply the linnear model on each row
mydf_modeled<-mydf_nest%>%
mutate(models=data%>%map(model_fct))
#> Warning: Problem with `mutate()` input `models`.
#> i NaNs produced
#> i Input `models` is `data %>% map(model_fct)`.
#> Warning in log(Net.Income..thousands.): NaNs produced
#> Warning: Problem with `mutate()` input `models`.
#> i NaNs produced
#> i Input `models` is `data %>% map(model_fct)`.
#> Warning in log(Net.Income..thousands.): NaNs produced
# tidy up and unnest your results - Two rows per model
mydf_modeled%>%
mutate(models_tidy=map(models,tidy))%>%
unnest(models_tidy)
#> # A tibble: 8 x 9
#> Year Acquirer.Indust~ data models term estimate std.error statistic p.value
#> <dbl> <chr> <lis> <list> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2019 Industry11 <tib~ <lm> (Int~ 6.80 NaN NaN NaN
#> 2 2019 Industry11 <tib~ <lm> log(~ 0.502 NaN NaN NaN
#> 3 2019 Industry12 <tib~ <lm> (Int~ -5.00 NaN NaN NaN
#> 4 2019 Industry12 <tib~ <lm> log(~ 1.42 NaN NaN NaN
#> 5 2018 Industry11 <tib~ <lm> (Int~ 8.59 5.95 1.44 0.386
#> 6 2018 Industry11 <tib~ <lm> log(~ 0.373 0.406 0.917 0.527
#> 7 2018 Industry12 <tib~ <lm> (Int~ -3.63 NaN NaN NaN
#> 8 2018 Industry12 <tib~ <lm> log(~ 1.38 NaN NaN NaN
Created on 2020-10-28 by the reprex package (v0.3.0)
If you want to know more about this workflow, you can have a look at the chapter about many models in R four datascience (Chapter 25), it describes the workflow concisely and gives you more examples in how it can be used.
Regarding the NaNs you are getting, you have a negative value in the Net.Income..thousands. column in your sample data which will give you a NaN when you try to take the logarithm.