Hello,
I'm new in R. I have a data set with 245 variables and 32 regions or geographic area. I need to obtain the ols model coefficients for each of the 32 areas,
I thought this code was going to work:
y<- c(MMSI_2016$escacu_inf) #dependent variable
x1<- c(MMSI_2016$escacu_pad) #independent variable
x2<- c(MMSI_2016$escacu_mad) #independent variable
edo<- c(MMSI_2016$ubica_geo_enh) #This variable goes fron 1:32
predictorlist<- list("x1","x2")
for (i in edo){
model[i] <- lm(y[i] ~ x1[i] + x2[i])
summary(model_[i])
}
It works but it returns the result for all the data set, not each different region (1:32)
I'll be very grateful if someone can help me.
Thanks
FJCC
June 16, 2020, 2:50am
2
You can do this clleanly with the map function from the purrr package.
See this vignette for an example.
library(dplyr)
library(purrr)
###Inventing some data
DF <- data.frame(pad = runif(100, min = 0, max = 10),
mad = runif(100, min = 0, max = 10),
noise = runif(100),
ubica = rep(1:10, each = 10))
#Calculate the inf column with known coef
DF <- DF %>% mutate(inf = 3 * pad - 2 * mad + 2.3 + noise)
###
FITS <- DF %>% split(.$ubica) %>%
map(~lm(inf ~ pad + mad, data = .)) %>%
map(summary)
FITS$`1`
#>
#> Call:
#> lm(formula = inf ~ pad + mad, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.47000 -0.19139 0.03641 0.20225 0.49864
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.90177 0.36210 8.014 9.02e-05 ***
#> pad 2.97840 0.05238 56.866 1.36e-10 ***
#> mad -2.01318 0.04507 -44.670 7.36e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.3606 on 7 degrees of freedom
#> Multiple R-squared: 0.9986, Adjusted R-squared: 0.9983
#> F-statistic: 2570 on 2 and 7 DF, p-value: 9.28e-11
Created on 2020-06-15 by the reprex package (v0.3.0)
Thanks a lot. I'll try it!
system
Closed
July 7, 2020, 2:59am
4
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.