loop for regression lm(y~x)

Can someone please point me towards right direction, my current data looks like this =>

Meter= c( Meter1, Meter 2, Meter 3........Meter 1440)

and for each meter, I have monthly electricity consumption,

Cons=c ( 6623, 11285, 21785....) like this.

I would like to cerate a simple regression model for each meter Meter 1:

lm(Cons~ HDD), Meter2: lm( Cons~HDD)----Meter1140=lm( Cons~ HDD ) 

Please see below my code this gives me an error message of

Error in model.frame.default(formula = df$Cons ~ HDD, data = data.subset, : variable lengths differ (found for 'HDD')

Please can you advise where I'm doing wrong

mod.store<- list()
unique(df$Meter)
for (i in 1:unique(df$Meter)){ 
    data.subset<-df[df$Meter==unique(df$Meter)[i],]
    mod<-lm(df$Cons~HDD,data=data.subset)
    mod.store[[i]]<-summary(mod)
}

Thanks in advance!

I think you might want to look into this vignette: https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html It gives an example of exactly what you're asking.

1 Like

Welcome to the community!

I think this error is self explanatory. While you are specifying the data argument of lm and using the formula interface, R is trying to access the x and y variables from the supplied data argument. So it tries to find HDD and df$Cons from data.subset. However, since data.subset is a subset of df, obviously it cannot have more number of rows than df, and very likely less. So, what is happening is that it is getting HDD from data.subset with less number of observations, and Cons from df with more number of data points. Do you really want to regress Cons from df with HDD from data.subset.

I think this makes much more sense:

mod <- lm(Cons ~ HDD, data = data.subset)

If it does not solve your problem, can you please provide a REPRoducible EXample of your problem? It provides more specifics of your problem, and it helps others to understand what problem you are facing.

If you don't know how to do it, take a look at this thread:

Hello Yarnabrina, Thanks for looking into this. I think I might have put df$Cons there by mistake but either way the code doesn't seem to work. If I can get the code to work,I would like to see for each model, intercept, slope and R- squared, p-val . Hope you can help.

Thank you.

Meter Date Cons HHD
1 Meter1 01/09/2017 6,623 60.3
2 Meter1 01/10/2017 11,285 88.8
3 Meter1 01/11/2017 21,725 235.0
4 Meter1 01/12/2017 32,283 299.2
5 Meter1 01/01/2018 30,439 299.2
6 Meter1 01/02/2018 31,479 326.3
7 Meter1 01/03/2018 34,029 312.6
8 Meter1 01/04/2018 18,741 152.9
9 Meter1 01/05/2018 10,036 78.8
10 Meter1 01/06/2018 4,178 26.3
11 Meter1 01/07/2018 740 6.6
12 Meter2 01/09/2017 54,504 60.3
13 Meter2 01/10/2017 86,680 88.8
14 Meter2 01/11/2017 122,111 235.0
15 Meter2 01/12/2017 108,195 299.2
16 Meter2 01/01/2018 131,434 299.2
17 Meter2 01/02/2018 146,643 326.3
18 Meter2 01/03/2018 160,782 312.6
19 Meter2 01/04/2018 118,632 152.9
20 Meter2 01/05/2018 83,731 78.8
21 Meter2 01/06/2018 57,473 26.3
22 Meter2 01/07/2018 50,314 6.6
23 Meter3 01/09/2017 63,220 60.3
24 Meter3 01/10/2017 84,384 88.8
25 Meter3 01/11/2017 158,851 235.0
26 Meter3 01/12/2017 182,480 299.2
27 Meter3 01/01/2018 179,076 299.2
28 Meter3 01/02/2018 170,125 326.3
29 Meter3 01/03/2018 168,682 312.6
30 Meter3 01/04/2018 120,988 152.9
31 Meter3 01/05/2018 85,762 78.8
32 Meter3 01/06/2018 60,264 26.3
33 Meter3 01/07/2018 53,301 6.6
34 Meter4 01/09/2017 30,774 60.3
35 Meter4 01/10/2017 43,452 88.8
36 Meter4 01/11/2017 70,170 235.0
37 Meter4 01/12/2017 86,572 299.2
38 Meter4 01/01/2018 87,670 299.2
39 Meter4 01/02/2018 83,945 326.3
40 Meter4 01/03/2018 80,888 312.6
41 Meter4 01/04/2018 55,462 152.9
42 Meter4 01/05/2018 27,415 78.8
43 Meter4 01/06/2018 15,767 26.3
44 Meter4 01/07/2018 12,696 6.6

Thanks Steph for looking this I will check your link.

Here's a code to extract all those information. You can try to use broom as @StatSteph suggested as well to have a concise table.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dataset <- read.table(text = "Meter Date Cons HHD
1 Meter1 01/09/2017 6,623 60.3
2 Meter1 01/10/2017 11,285 88.8
3 Meter1 01/11/2017 21,725 235.0
4 Meter1 01/12/2017 32,283 299.2
5 Meter1 01/01/2018 30,439 299.2
6 Meter1 01/02/2018 31,479 326.3
7 Meter1 01/03/2018 34,029 312.6
8 Meter1 01/04/2018 18,741 152.9
9 Meter1 01/05/2018 10,036 78.8
10 Meter1 01/06/2018 4,178 26.3
11 Meter1 01/07/2018 740 6.6
12 Meter2 01/09/2017 54,504 60.3
13 Meter2 01/10/2017 86,680 88.8
14 Meter2 01/11/2017 122,111 235.0
15 Meter2 01/12/2017 108,195 299.2
16 Meter2 01/01/2018 131,434 299.2
17 Meter2 01/02/2018 146,643 326.3
18 Meter2 01/03/2018 160,782 312.6
19 Meter2 01/04/2018 118,632 152.9
20 Meter2 01/05/2018 83,731 78.8
21 Meter2 01/06/2018 57,473 26.3
22 Meter2 01/07/2018 50,314 6.6
23 Meter3 01/09/2017 63,220 60.3
24 Meter3 01/10/2017 84,384 88.8
25 Meter3 01/11/2017 158,851 235.0
26 Meter3 01/12/2017 182,480 299.2
27 Meter3 01/01/2018 179,076 299.2
28 Meter3 01/02/2018 170,125 326.3
29 Meter3 01/03/2018 168,682 312.6
30 Meter3 01/04/2018 120,988 152.9
31 Meter3 01/05/2018 85,762 78.8
32 Meter3 01/06/2018 60,264 26.3
33 Meter3 01/07/2018 53,301 6.6
34 Meter4 01/09/2017 30,774 60.3
35 Meter4 01/10/2017 43,452 88.8
36 Meter4 01/11/2017 70,170 235.0
37 Meter4 01/12/2017 86,572 299.2
38 Meter4 01/01/2018 87,670 299.2
39 Meter4 01/02/2018 83,945 326.3
40 Meter4 01/03/2018 80,888 312.6
41 Meter4 01/04/2018 55,462 152.9
42 Meter4 01/05/2018 27,415 78.8
43 Meter4 01/06/2018 15,767 26.3
44 Meter4 01/07/2018 12,696 6.6",
                      header = TRUE)

dataset %>%
    mutate(Cons = as.numeric(x = gsub(pattern = ",",
                                      replacement = "",
                                      x = Cons,
                                      fixed = TRUE))) %>%
    group_by(Meter) %>%
    group_map(.f = ~ summary(object = lm(formula = Cons ~ HHD,
                                         data = .x)))
#> [[1]]
#> 
#> Call:
#> lm(formula = Cons ~ HHD, data = .x)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#>  -2832  -1104     94   1249   2238 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 1504.010    917.992   1.638    0.136    
#> HHD           98.099      4.393  22.330 3.43e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1740 on 9 degrees of freedom
#> Multiple R-squared:  0.9823, Adjusted R-squared:  0.9803 
#> F-statistic: 498.6 on 1 and 9 DF,  p-value: 3.429e-09
#> 
#> 
#> [[2]]
#> 
#> Call:
#> lm(formula = Cons ~ HHD, data = .x)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -29446  -5793   1413   7891  21965 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 53845.28    8252.02   6.525 0.000108 ***
#> HHD           280.06      39.49   7.092 5.72e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 15640 on 9 degrees of freedom
#> Multiple R-squared:  0.8482, Adjusted R-squared:  0.8313 
#> F-statistic: 50.29 on 1 and 9 DF,  p-value: 5.718e-05
#> 
#> 
#> [[3]]
#> 
#> Call:
#> lm(formula = Cons ~ HHD, data = .x)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -14366.3  -6173.4    622.1   6873.8  12002.7 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 49957.67    4928.12   10.14 3.19e-06 ***
#> HHD           412.30      23.58   17.48 2.96e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 9343 on 9 degrees of freedom
#> Multiple R-squared:  0.9714, Adjusted R-squared:  0.9682 
#> F-statistic: 305.6 on 1 and 9 DF,  p-value: 2.965e-08
#> 
#> 
#> [[4]]
#> 
#> Call:
#> lm(formula = Cons ~ HHD, data = .x)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#>  -5816  -5078   1451   3605   8428 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 14558.00    2872.05   5.069 0.000673 ***
#> HHD           230.47      13.74  16.768 4.27e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 5445 on 9 degrees of freedom
#> Multiple R-squared:  0.969,  Adjusted R-squared:  0.9655 
#> F-statistic: 281.2 on 1 and 9 DF,  p-value: 4.27e-08

Created on 2020-01-12 by the reprex package (v0.3.0)

Thanks Yarnabrina.

I have tried the link StatSteph suggested and somehow managed to get it worked. But, your way also good and short so many thanks for this, really appreciated.
Since, I have 1440 meters in my actual data set, can you please also advise with your way if it is possible to extract for each meter's model ( p-values, R-squared, intercept and slope values in one table?

By using the link Step advised , I produced the code below but in here R only displays 10 rows, is there a way I can set to see all the extract. My main goal is ; for each meter in total 1140 meters I want to extract slope, intercept, p val and R-squared values only but R only displays and 10 rows.

Thanks

regressions <- data %>%
nest(-Meter) %>%
mutate(
fit = map(data, ~lm(as.numeric(Cons) ~as.numeric( HHD), data = .x)),
tidied = map(fit, tidy),
glanced = map(fit, glance),
augmented = map(fit, augment)
)

regressions %>%
unnest(tidied)

regressions %>%
unnest(glanced, .drop = TRUE)

regressions %>%
unnest(augmented)

4 posts were merged into an existing topic: lm(y~x )model, R only displays first 10 rows, how to get remaining results see below