For-Loop on a Multivariate regression model in Rstudio

Hi guys,

I am trying to do a loop for a regression model on energy data. There are columns for state, MSN(Type of energy) and the years from 1960-1990 in 5 year steps and 1991 to 2017 yearly. I tested my model for 1960 and it worked perfectly. However I have a problem doing the "for"-Loop to do the regressions on the columns/years 1961-2017. Heres some view on the Dataset itself:
Data

Thats the problematic passage of the code with the model for 1960 working fine!

Model_1960 <- lm(log(X1960[which(MSN=="KILOM")]+1) ~ log(X1960[which(MSN=="PAACB")]+1) + log(X1960[which(MSN=="NGACB")] + X1960[which(MSN=="PQACB")] +1) + log(X1960[which(MSN=="EMACB")] + X1960[which(MSN=="ESACB")] +1) , data=Data_Total)
summary(Model_1960)

df_list = colnames(Data_Total[3:36])

for(i in df_list){
  df_i = mutate(get(i), FUN)
  Model_i <- lm(log(i[which(MSN=="KILOM")]+1) ~ log(i[which(MSN=="PAACB")]+1) + log(i[which(MSN=="NGACB")] + i[which(MSN=="PQACB")] +1) + log(i[which(MSN=="EMACB")] + i[which(MSN=="ESACB")] +1) , data=Data_Total)
  summary(Model_i) 
}

sapply(colnames(Data_Total[3:36]), function(x) {
  Model <- lm(log("x"[which(MSN=="KILOM")]+1) ~ log("x"[which(MSN=="PAACB")]+1) + log("x"[which(MSN=="NGACB")] + "x"[which(MSN=="PQACB")] +1) + log("x"[which(MSN=="EMACB")] + "x"[which(MSN=="ESACB")] +1) , data=Data_Total)
  summary(Model)} )

Error codes are:

> df_list = colnames(Data_Total[3:36])
> 
> for(i in df_list){
+   df_i = mutate(get(i), FUN)
+   Model_i <- lm(log(i[which(MSN=="KILOM")]+1) ~ log(i[which(MSN=="PAACB")]+1) + log(i[which(MSN=="NGACB")] + i[which(MSN=="PQACB")] +1) + log(i[which(MSN=="EMACB")] + i[which(MSN=="ESACB")] +1) , data=Data_Total)
+   summary(Model_i) 
+ }
Error in get(i) : object 'X1960' not found
> 
> sapply(colnames(Data_Total[3:36]), function(x) {
+   Model <- lm(log("x"[which(MSN=="KILOM")]+1) ~ log("x"[which(MSN=="PAACB")]+1) + log("x"[which(MSN=="NGACB")] + "x"[which(MSN=="PQACB")] +1) + log("x"[which(MSN=="EMACB")] + "x"[which(MSN=="ESACB")] +1) , data=Data_Total)
+   summary(Model)} )
Error in "x"[which(MSN == "KILOM")] + 1 : 
  non-numeric argument to binary operator
Called from: eval(predvars, data, env)

I hope anybody could help me!

This is usually an indication that your data is not in a shape appropriate for analysis.

this is why spreadsheets work ok for some simple analysis but not for more complex ones.

You may find that time spent in the beginning to reshape your data can make lots of visual and modeling efforts a lot easier. You can find a lot of resources by searching for [wide versus long data in r] or some variants of that. Here is one quick thing I found that could give you a start.

Enjoy the journey!

So guys, I guess that would be a reproducible example for my problem.
I filtered for 6 different years and 5 of the total 52 states.

  State   MSN       X1960       X1965       X1970       X1975       X1980       X1985
1     AK EMACB        0.00        0.00        0.00        0.00        0.00        0.00
2     AK ESACB        0.00        0.00        0.00        0.00        0.00        0.00
3     AK NGACB        2.00        0.00    17393.00       94.00      129.00     5178.00
4     AK PAACB    27051.00    34357.00    58980.00    79569.00    89675.00   148753.00
5     AK PQACB        0.00        0.00        0.00        0.00        0.00        0.00
6     AK TEACB    27139.00    34379.00    76386.00    79665.00    89804.00   153931.00
7     DC EMACB        0.00        0.00        0.00        0.00        0.00        0.00
8     DC ESACB      110.00        0.00        0.00        0.00      363.00      442.00
9     DC NGACB        9.00        0.00       20.00       22.00        0.00      382.00
10    DC PAACB    28225.00    33810.00    32811.00    37046.00    26157.00    26365.00
11    DC PQACB        0.00        0.00        0.00        0.00        0.00        0.00
12    DC TEACB    28823.00    33820.00    32848.00    37069.00    27391.00    28202.00
13    FL EMACB        0.00        0.00        0.00        0.00        0.00     3718.00
14    FL ESACB        0.00        0.00        0.00        0.00        0.00       62.00
15    FL NGACB     1035.00     2623.00     4538.00     2500.00     3921.00     4288.00
16    FL PAACB   346966.00   454089.00   603814.00   745189.00   946667.00   949500.00
17    FL PQACB        0.00        0.00        0.00        0.00        0.00        0.00
18    FL TEACB   348001.00   456712.00   608352.00   747689.00   950588.00   957710.00
19    US EMACB        0.00        0.00        0.00        0.00        0.00    49509.00
20    US ESACB    10460.00     9974.00    10627.00    10150.00    11069.00    14148.00
21    US NGACB   359223.00   517864.00   740413.00   594622.00   649919.00   520729.00
22    US PAACB 10125440.00 11866228.00 15310548.00 17615388.00 19009268.00 19472382.00
23    US PQACB        0.00        0.00        0.00        0.00        0.00        0.00
24    US TEACB 10596612.00 12434010.00 16094212.00 18245049.00 19696846.00 20089174.00
25    WY EMACB        0.00        0.00        0.00        0.00        0.00        2.00
26    WY ESACB        0.00        0.00        0.00        0.00        0.00        0.00
27    WY NGACB     1816.00     2046.00     6045.00     4923.00     6204.00     5221.00
28    WY PAACB    39464.00    42247.00    51354.00    61142.00    82237.00    63578.00
29    WY PQACB        0.00        0.00        0.00        0.00        0.00        0.00
30    WY TEACB    41322.00    44301.00    57405.00    66067.00    88441.00    68801.00
31    AK KILOM     3109.91     4116.95     8768.55     9585.26    11498.09    22330.51
32    DC KILOM     3302.89     4050.01     3770.71     4460.13     3507.02     4091.22
33    FL KILOM    39878.15    54692.16    69834.35    89961.66   121708.95   138933.38
34    US KILOM  1214287.84  1488997.55  1847497.68  2195237.40  2521894.06  2914302.75
35    WY KILOM     4735.17     5305.13     6589.67     7949.16    11323.58     9980.85

I guess the code for reading and preparing the data are irrelevant for the purpose of this.
The columns "State" and "MSN" are factors. All other are numeric. I called this Dataframe "Data_Total".

Here are the libraries used in my program, I just copied all of them from my "introduction to R" class in university.

library(moments)
library(tidyverse)
library(broom)
library(wooldridge)
library(magrittr)
library(margins)
library(car)
library(lmtest)
library(stargazer)
library(clubSandwich)
library(plm)
library(lm.beta)

This is the regression model for 1960 that is consistent with my results for this year from excel.

#I added 1 to each of the factors to not have the problem of LN(0) which is not possible. That does
#not represent a problem for my prupose.

Model_1960 <- lm(log(X1960[which(MSN=="KILOM")]+1) ~ log(X1960[which(MSN=="PAACB")]+1) + log(X1960[which(MSN=="NGACB")] 
                + X1960[which(MSN=="PQACB")] +1) + log(X1960[which(MSN=="EMACB")] + X1960[which(MSN=="ESACB")] +1) , data=Data_Total)
summary(Model_1960)

Then what I want to do is run this regression for every column (X1960 to X1990) in order to get the estimates for the respective years and safe them in a table to use them later.

#Thats the main problem I have. I want to use a "For-Loop" to run the regression model on all of the years. I have
#1960-1990 every five years as the columns of the total data set and from 1991 to 2017 yearly as in the columns.
#I also would like to store the results of the regression model of each year to later export them to excel or reuse them
#in other models.

df_list = colnames(Data_Total[3:8])
for(i in df_list){
  Model_i <- lm(log([i][which(MSN=="KILOM")]+1) ~ log([i][which(MSN=="PAACB")]+1) + log([i][which(MSN=="NGACB")] + [i][which(MSN=="PQACB")] +1) + log([i][which(MSN=="EMACB")] + [i][which(MSN=="ESACB")] +1) , data=Data_Total)
  summary(Model_i) 
  coeff <- summary(Model_i)$coefficients[1:4, 1:4]
}

I hope that this is all the information that is needed in order to reproduce my problem and hopefully resolve it!

Wrong guess. Except from a screenshot of data to copy paste from console output, you haven't changed much. We need to create that data in our environment, and you cannot expect us to type it ourselves.

Your code is supposed to be minimal. Have you used all these packages for the problem for which you created this thread?Most probably not. I guess not a single one is used, but you may have used readr which is part of tidyverse.

I disagree. Reproducible means we will have the exact same setup in our environment as you have. For this, we can guess that may be read data from CSV using read.csv or something, but there is no way for use to be sure about that.


Try something like this:

with(data = Data_Total,
     expr =
         {
             for (column_name in names(x = Data_Total)[3:8])
             {
                 y <- log(x = Data_Total[[column_name]][MSN == "KILOM"] + 1)
                 x1 <- log(x = Data_Total[[column_name]][MSN == "PAACB"] + 1)
                 x2 <- log(x = Data_Total[[column_name]][MSN == "NGACB"] + 1)
                 x3 <- log(x = Data_Total[[column_name]][MSN == "PQACB"] + 1)
                 x4 <- log(x = Data_Total[[column_name]][MSN == "EMACB"] + 1)
                 x5 <- log(x = Data_Total[[column_name]][MSN == "ESACB"] + 1)
                 model <- lm(formula = y ~ x1 + x2 + x3 + x4 + x5)
                 assign(x = paste0("Coefficients_", substr(x = column_name,
                                                           start = 2,
                                                           stop = 5)),
                        value = summary(object = model)$coefficients,
                        envir = .GlobalEnv)
             }
         })

mget(x = ls(pattern = "Coefficients_"))

Hello Yarnabrina,

I'm sorry that I could not meet the requirements that you requested.
As I'm totally new to R and this community I did not completely understand what you asked for.
However I tried your approach for my model and it worked out perfectly!
Thank you very much for your help and sacrificing your free time to resolve my issue!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.