Multiple linear regression - for loop

Hi everyone! I want to do a for loop that store the value of the residual of a the regression of 10 regression, the first regression is with the 1st regressor (x1), the second with the first and second regressor (x1 and x2) .. etc

I have 10 x variables so i will have 10 residual value store in a matrix that i've already created. I don't

know how to run the regression 10 times with different regressor in each loop ... i've tried something but it's not working.. Thank you for you help!

simul.train=read.table("datasimul1_train.txt",header=T)

simul.test=read.table("datasimul1_test.txt",header=T)

x=matrix(0,nrow(simul.train),10)

for (i in 1:10)

{

x[,i]=simul.train$x^i

}

colnames(x)=c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")

simul.train=data.frame(y=simul.train$y,x)

x=matrix(0,nrow(simul.test),10)

for (i in 1:10)

{

x[,i]=simul.test$x^i

}

colnames(x)=c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10")

simul.test=data.frame(y=simul.test$y,x)

residual_sigma = matrix(data = 0, nrow = 1, ncol = 10, byrow = FALSE,

dimnames = NULL)

colnames(residual_sigma) <- c("1 var", "2 vars", "3 vars", "4 vars", "5 vars",

"6 vars", "7 vars", "8 vars", "9 vars", "10 vars")

rownames(residual_sigma) <- c("Residual of regression")

varnames <- c("x1","x2","x3","x4",

"x5","x6","x7",

"x8","x9","x10")

for (i in varnames)

{

model=lm(y~i, data = simul.train)

residual_sigma[1,match(c(i),varnames)] <- sigma(model)

}

Problems in R are best considered as application of school alegbra: f(x) = y, involving the three objects f, x and y, where object f is some function, which may be a composite function, g(f(x)), x, the object to hand (usually the data) and y, some desired object as the return value.

In this case x is a data frame containing one variable, Y and X_1 \dots X_n variables to be regressed against. lm is available as f and y is to be derived thusly

x <- seq(1:6)
y <- c(17.2,11.0,15,4,9,12)
fit <- lm(x ~ y)
summary(fit)
#> 
#> Call:
#> lm(formula = x ~ y)
#> 
#> Residuals:
#>       1       2       3       4       5       6 
#> -1.3348 -1.5732  0.2258 -0.9715  1.0273  2.6265 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)   5.7705     2.1244   2.716   0.0532 .
#> y            -0.1998     0.1751  -1.141   0.3177  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.817 on 4 degrees of freedom
#> Multiple R-squared:  0.2454, Adjusted R-squared:  0.05676 
#> F-statistic: 1.301 on 1 and 4 DF,  p-value: 0.3177
fit$residuals
#>          1          2          3          4          5          6 
#> -1.3347786 -1.5732425  0.2257665 -0.9715082  1.0272530  2.6265098

Created on 2020-09-29 by the reprex package (v0.3.0.9001)

Inspection leads to the identification of fit$residuals as the location of the desired content for each entry in y. It is a numeric vector.

There remains the construction of g() and h(). g will perform the stepwise regressions and h will capture the respective fit$residuals

The foregoing is in aid of emphasizing the functional orientation of R. Although the language has procedural/imperative constructs available, they are usually not the preferred approach. Programmers experienced in language such as the C languages and Python should consider implementing functions in those languages and making them available via {Rcpp} or {retriculate}, respectively.

Next, in selecting or constructing g, bear in mind that R is a mature language developed and maintained by and for statisticians. Except at pre-publication stages of statistical theory advances, one or more packages already exist for most problems involving statistical analysis. What's more, the basics have had 30 years' of development and scrutiny.

To begin, rseek, an R tuned front end to Google is a good source. The first hit on stepwise is the {olsrr} package vignette.

library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
model <- lm(y ~ ., data = surgical)
ols_step_forward_p(model, details = TRUE) -> a
#> Forward Selection Method    
#> ---------------------------
#> 
#> Candidate Terms: 
#> 
#> 1. bcs 
#> 2. pindex 
#> 3. enzyme_test 
#> 4. liver_test 
#> 5. age 
#> 6. gender 
#> 7. alc_mod 
#> 8. alc_heavy 
#> 
#> We are selecting variables based on p value...
#> 
#> 
#> Forward Selection: Step 1 
#> 
#> - liver_test 
#> 
#>                           Model Summary                           
#> -----------------------------------------------------------------
#> R                       0.674       RMSE                 296.299 
#> R-Squared               0.455       Coef. Var             42.202 
#> Adj. R-Squared          0.444       MSE                87793.232 
#> Pred R-Squared          0.386       MAE                  212.857 
#> -----------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    3804272.477         1    3804272.477    43.332    0.0000 
#> Residual      4565248.060        52      87793.232                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                     Parameter Estimates                                     
#> -------------------------------------------------------------------------------------------
#>       model       Beta    Std. Error    Std. Beta      t       Sig        lower      upper 
#> -------------------------------------------------------------------------------------------
#> (Intercept)     15.191       111.869                 0.136    0.893    -209.290    239.671 
#>  liver_test    250.305        38.025        0.674    6.583    0.000     174.003    326.607 
#> -------------------------------------------------------------------------------------------
#> 
#> 
#> 
#> Forward Selection: Step 2 
#> 
#> - alc_heavy 
#> 
#>                           Model Summary                           
#> -----------------------------------------------------------------
#> R                       0.753       RMSE                 266.648 
#> R-Squared               0.567       Coef. Var             37.979 
#> Adj. R-Squared          0.550       MSE                71101.387 
#> Pred R-Squared          0.487       MAE                  187.393 
#> -----------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    4743349.776         2    2371674.888    33.356    0.0000 
#> Residual      3626170.761        51      71101.387                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                     Parameter Estimates                                      
#> --------------------------------------------------------------------------------------------
#>       model       Beta    Std. Error    Std. Beta      t        Sig        lower      upper 
#> --------------------------------------------------------------------------------------------
#> (Intercept)     -5.069       100.828                 -0.050    0.960    -207.490    197.352 
#>  liver_test    234.597        34.491        0.632     6.802    0.000     165.353    303.841 
#>   alc_heavy    342.183        94.156        0.338     3.634    0.001     153.157    531.208 
#> --------------------------------------------------------------------------------------------
#> 
#> 
#> 
#> Forward Selection: Step 3 
#> 
#> - enzyme_test 
#> 
#>                           Model Summary                           
#> -----------------------------------------------------------------
#> R                       0.812       RMSE                 238.914 
#> R-Squared               0.659       Coef. Var             34.029 
#> Adj. R-Squared          0.639       MSE                57080.128 
#> Pred R-Squared          0.567       MAE                  170.603 
#> -----------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    5515514.136         3    1838504.712    32.209    0.0000 
#> Residual      2854006.401        50      57080.128                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                      Parameter Estimates                                      
#> ---------------------------------------------------------------------------------------------
#>       model        Beta    Std. Error    Std. Beta      t        Sig        lower      upper 
#> ---------------------------------------------------------------------------------------------
#> (Intercept)    -344.559       129.156                 -2.668    0.010    -603.976    -85.141 
#>  liver_test     183.844        33.845        0.495     5.432    0.000     115.865    251.823 
#>   alc_heavy     319.662        84.585        0.315     3.779    0.000     149.769    489.555 
#> enzyme_test       6.263         1.703        0.335     3.678    0.001       2.843      9.683 
#> ---------------------------------------------------------------------------------------------
#> 
#> 
#> 
#> Forward Selection: Step 4 
#> 
#> - pindex 
#> 
#>                           Model Summary                           
#> -----------------------------------------------------------------
#> R                       0.866       RMSE                 206.584 
#> R-Squared               0.750       Coef. Var             29.424 
#> Adj. R-Squared          0.730       MSE                42676.744 
#> Pred R-Squared          0.669       MAE                  146.473 
#> -----------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6278360.060         4    1569590.015    36.779    0.0000 
#> Residual      2091160.477        49      42676.744                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                       
#> -----------------------------------------------------------------------------------------------
#>       model        Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> -----------------------------------------------------------------------------------------------
#> (Intercept)    -789.012       153.372                 -5.144    0.000    -1097.226    -480.799 
#>  liver_test     125.474        32.358        0.338     3.878    0.000       60.448     190.499 
#>   alc_heavy     359.875        73.754        0.355     4.879    0.000      211.660     508.089 
#> enzyme_test       7.548         1.503        0.404     5.020    0.000        4.527      10.569 
#>      pindex       7.876         1.863        0.335     4.228    0.000        4.133      11.620 
#> -----------------------------------------------------------------------------------------------
#> 
#> 
#> 
#> Forward Selection: Step 5 
#> 
#> - bcs 
#> 
#>                           Model Summary                           
#> -----------------------------------------------------------------
#> R                       0.884       RMSE                 195.454 
#> R-Squared               0.781       Coef. Var             27.839 
#> Adj. R-Squared          0.758       MSE                38202.426 
#> Pred R-Squared          0.700       MAE                  137.656 
#> -----------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6535804.090         5    1307160.818    34.217    0.0000 
#> Residual      1833716.447        48      38202.426                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1178.330       208.682                 -5.647    0.000    -1597.914    -758.746 
#>  liver_test       58.064        40.144        0.156     1.446    0.155      -22.652     138.779 
#>   alc_heavy      317.848        71.634        0.314     4.437    0.000      173.818     461.878 
#> enzyme_test        9.748         1.656        0.521     5.887    0.000        6.419      13.077 
#>      pindex        8.924         1.808        0.380     4.935    0.000        5.288      12.559 
#>         bcs       59.864        23.060        0.241     2.596    0.012       13.498     106.230 
#> ------------------------------------------------------------------------------------------------
#> 
#> 
#> 
#> No more variables to be added.
#> 
#> Variables Entered: 
#> 
#> + liver_test 
#> + alc_heavy 
#> + enzyme_test 
#> + pindex 
#> + bcs 
#> 
#> 
#> Final Model Output 
#> ------------------
#> 
#>                           Model Summary                           
#> -----------------------------------------------------------------
#> R                       0.884       RMSE                 195.454 
#> R-Squared               0.781       Coef. Var             27.839 
#> Adj. R-Squared          0.758       MSE                38202.426 
#> Pred R-Squared          0.700       MAE                  137.656 
#> -----------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6535804.090         5    1307160.818    34.217    0.0000 
#> Residual      1833716.447        48      38202.426                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1178.330       208.682                 -5.647    0.000    -1597.914    -758.746 
#>  liver_test       58.064        40.144        0.156     1.446    0.155      -22.652     138.779 
#>   alc_heavy      317.848        71.634        0.314     4.437    0.000      173.818     461.878 
#> enzyme_test        9.748         1.656        0.521     5.887    0.000        6.419      13.077 
#>      pindex        8.924         1.808        0.380     4.935    0.000        5.288      12.559 
#>         bcs       59.864        23.060        0.241     2.596    0.012       13.498     106.230 
#> ------------------------------------------------------------------------------------------------
a
#> 
#>                               Selection Summary                                
#> ------------------------------------------------------------------------------
#>         Variable                     Adj.                                         
#> Step      Entered      R-Square    R-Square     C(p)        AIC         RMSE      
#> ------------------------------------------------------------------------------
#>    1    liver_test       0.4545      0.4440    62.5119    771.8753    296.2992    
#>    2    alc_heavy        0.5667      0.5498    41.3681    761.4394    266.6484    
#>    3    enzyme_test      0.6590      0.6385    24.3379    750.5089    238.9145    
#>    4    pindex           0.7501      0.7297     7.5373    735.7146    206.5835    
#>    5    bcs              0.7809      0.7581     3.1925    730.6204    195.4544    
#> ------------------------------------------------------------------------------
str(a)
#> List of 11
#>  $ predictors: chr [1:5] "liver_test" "alc_heavy" "enzyme_test" "pindex" ...
#>  $ mallows_cp: num [1:5] 62.51 41.37 24.34 7.54 3.19
#>  $ indvar    : chr [1:8] "bcs" "pindex" "enzyme_test" "liver_test" ...
#>  $ rsquare   : num [1:5] 0.455 0.567 0.659 0.75 0.781
#>  $ steps     : num 5
#>  $ sbic      : num [1:5] 616 606 595 583 580
#>  $ adjr      : num [1:5] 0.444 0.55 0.639 0.73 0.758
#>  $ rmse      : num [1:5] 296 267 239 207 195
#>  $ aic       : num [1:5] 772 761 751 736 731
#>  $ sbc       : num [1:5] 778 769 760 748 745
#>  $ model     :List of 12
#>   ..$ coefficients : Named num [1:6] -1178.33 58.06 317.85 9.75 8.92 ...
#>   .. ..- attr(*, "names")= chr [1:6] "(Intercept)" "liver_test" "alc_heavy" "enzyme_test" ...
#>   ..$ residuals    : Named num [1:54] -20.99 7.46 2.2 -29.59 785.83 ...
#>   .. ..- attr(*, "names")= chr [1:54] "1" "2" "3" "4" ...
#>   ..$ effects      : Named num [1:54] -5159 1950 969 879 -873 ...
#>   .. ..- attr(*, "names")= chr [1:54] "(Intercept)" "liver_test" "alc_heavy" "enzyme_test" ...
#>   ..$ rank         : int 6
#>   ..$ fitted.values: Named num [1:54] 716 396 708 379 1557 ...
#>   .. ..- attr(*, "names")= chr [1:54] "1" "2" "3" "4" ...
#>   ..$ assign       : int [1:6] 0 1 2 3 4 5
#>   ..$ qr           :List of 5
#>   .. ..$ qr   : num [1:54, 1:6] -7.348 0.136 0.136 0.136 0.136 ...
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:54] "1" "2" "3" "4" ...
#>   .. .. .. ..$ : chr [1:6] "(Intercept)" "liver_test" "alc_heavy" "enzyme_test" ...
#>   .. .. ..- attr(*, "assign")= int [1:6] 0 1 2 3 4 5
#>   .. ..$ qraux: num [1:6] 1.14 1.13 1.05 1.22 1.01 ...
#>   .. ..$ pivot: int [1:6] 1 2 3 4 5 6
#>   .. ..$ tol  : num 1e-07
#>   .. ..$ rank : int 6
#>   .. ..- attr(*, "class")= chr "qr"
#>   ..$ df.residual  : int 48
#>   ..$ xlevels      : Named list()
#>   ..$ call         : language lm(formula = paste(response, "~", paste(preds, collapse = " + ")), data = l)
#>   ..$ terms        :Classes 'terms', 'formula'  language y ~ liver_test + alc_heavy + enzyme_test + pindex + bcs
#>   .. .. ..- attr(*, "variables")= language list(y, liver_test, alc_heavy, enzyme_test, pindex, bcs)
#>   .. .. ..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ...
#>   .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. ..$ : chr [1:6] "y" "liver_test" "alc_heavy" "enzyme_test" ...
#>   .. .. .. .. ..$ : chr [1:5] "liver_test" "alc_heavy" "enzyme_test" "pindex" ...
#>   .. .. ..- attr(*, "term.labels")= chr [1:5] "liver_test" "alc_heavy" "enzyme_test" "pindex" ...
#>   .. .. ..- attr(*, "order")= int [1:5] 1 1 1 1 1
#>   .. .. ..- attr(*, "intercept")= int 1
#>   .. .. ..- attr(*, "response")= int 1
#>   .. .. ..- attr(*, ".Environment")=<environment: 0x55bb376aeea0> 
#>   .. .. ..- attr(*, "predvars")= language list(y, liver_test, alc_heavy, enzyme_test, pindex, bcs)
#>   .. .. ..- attr(*, "dataClasses")= Named chr [1:6] "numeric" "numeric" "numeric" "numeric" ...
#>   .. .. .. ..- attr(*, "names")= chr [1:6] "y" "liver_test" "alc_heavy" "enzyme_test" ...
#>   ..$ model        :'data.frame':    54 obs. of  6 variables:
#>   .. ..$ y          : int [1:54] 695 403 710 349 2343 348 518 749 1056 968 ...
#>   .. ..$ liver_test : num [1:54] 2.59 1.7 2.16 2.01 4.3 1.42 1.91 2.57 2.5 2.4 ...
#>   .. ..$ alc_heavy  : int [1:54] 0 0 0 0 1 0 1 0 0 0 ...
#>   .. ..$ enzyme_test: int [1:54] 81 66 83 41 115 72 63 81 93 94 ...
#>   .. ..$ pindex     : int [1:54] 62 59 57 73 65 38 46 68 67 76 ...
#>   .. ..$ bcs        : num [1:54] 6.7 5.1 7.4 6.5 7.8 5.8 5.7 3.7 6 3.7 ...
#>   .. ..- attr(*, "terms")=Classes 'terms', 'formula'  language y ~ liver_test + alc_heavy + enzyme_test + pindex + bcs
#>   .. .. .. ..- attr(*, "variables")= language list(y, liver_test, alc_heavy, enzyme_test, pindex, bcs)
#>   .. .. .. ..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ...
#>   .. .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. .. ..$ : chr [1:6] "y" "liver_test" "alc_heavy" "enzyme_test" ...
#>   .. .. .. .. .. ..$ : chr [1:5] "liver_test" "alc_heavy" "enzyme_test" "pindex" ...
#>   .. .. .. ..- attr(*, "term.labels")= chr [1:5] "liver_test" "alc_heavy" "enzyme_test" "pindex" ...
#>   .. .. .. ..- attr(*, "order")= int [1:5] 1 1 1 1 1
#>   .. .. .. ..- attr(*, "intercept")= int 1
#>   .. .. .. ..- attr(*, "response")= int 1
#>   .. .. .. ..- attr(*, ".Environment")=<environment: 0x55bb376aeea0> 
#>   .. .. .. ..- attr(*, "predvars")= language list(y, liver_test, alc_heavy, enzyme_test, pindex, bcs)
#>   .. .. .. ..- attr(*, "dataClasses")= Named chr [1:6] "numeric" "numeric" "numeric" "numeric" ...
#>   .. .. .. .. ..- attr(*, "names")= chr [1:6] "y" "liver_test" "alc_heavy" "enzyme_test" ...
#>   ..- attr(*, "class")= chr "lm"
#>  - attr(*, "class")= chr "ols_step_forward_p"

Created on 2020-09-29 by the reprex package (v0.3.0.9001)

Inspection of the output structure will identify locations of the residuals from which h may be composed from ols_step_forward_p() as g.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.