How can I plot the betas and the R2 (goodness of fit) of many regression lines

jia1999 · November 15, 2020, 12:50pm

I made 30 regression lines and I get different betas and r2, and I wan to make a graph with it.

So I have
model 1, that the data goes from 1870-1874
model 2, that data goes from 1875-1879
....Model 30, that data goes from 2015-2019

so what I want is to extract the coefficient ( just the beta, not the intercept) of each of the 30 regression line and plot it in a linear graph

I did the following script;

TFG2<-read.csv("TFG2.csv", stringsAsFactors = FALSE,
na.strings = c("NA", "#DIV/0!"))

Look at the year range for each country

lapply(unique(TFG2$country), function(x) range(TFG2$year[TFG2$country == x]))

All countries have data from 1870 to 2016

Group years:

y0 <- seq(1870, 2016, by = 5) ## start points of intervals
y1 <- seq(y0[2]-1, 2020, by = 5) ## end points of intervals

Average

aux <- lapply(unique(TFG2$country), function(c){
d <- TFG2[TFG2$country == c,]
res <- lapply(1:length(y0), function(i){
yy0 <- y0[i]
yy1 <- y1[i]
dd <- d[d$year >= yy0 & d$year <= yy1,]
data.frame(country = c,
period = paste0(yy0, "-", yy1),
t(apply(dd[,-c(1,2)], 2, mean, na.rm = TRUE)))
})
res <- as.data.frame(do.call("rbind", res))
})
TFG3 <- do.call("rbind", aux)

Linear model for each period

mod.period <- lapply(unique(TFG3$period), function(p){
d <- TFG3[TFG3$period == p,]
lm(iy ~ say, data = d)
})

Summary of the models

lapply(mod.period, summary)

FJCC · November 15, 2020, 1:48pm

The package broom has functions for extracting neat data frames from model fits.

set.seed(123)
TFG3 <- data.frame(period=rep(LETTERS[1:10],each=10),X=rep(1:10,10),Value=rnorm(100))
Periods <- unique(TFG3$period)
mod.period <- lapply(Periods, function(p){
  d <- TFG3[TFG3$period == p,]
  lm(Value ~ X, data = d)
})
names(mod.period) <- Periods

library(broom)
library(dplyr, warn.conflicts = FALSE)

INFO <- lapply(mod.period,tidy)
INFOdf <- bind_rows(INFO, .id = "Period") %>% filter(term=="X")
head(INFOdf)
#> # A tibble: 6 x 6
#>   Period term  estimate std.error statistic p.value
#>   <chr>  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
#> 1 A      X      -0.0820    0.108     -0.762   0.468
#> 2 B      X      -0.129     0.112     -1.14    0.286
#> 3 C      X       0.145     0.0958     1.52    0.167
#> 4 D      X      -0.0802    0.0547    -1.47    0.181
#> 5 E      X       0.0386    0.126      0.307   0.767
#> 6 F      X      -0.0190    0.0998    -0.191   0.853


INFO2 <- lapply(mod.period,glance)
INFO2df <- bind_rows(INFO2, .id = "Period")
head(INFO2df)
#> # A tibble: 6 x 13
#>   Period r.squared adj.r.squared sigma statistic p.value    df logLik   AIC
#>   <chr>      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl>
#> 1 A        0.0677        -0.0488 0.977    0.581    0.468     1 -12.8   31.7
#> 2 B        0.141          0.0332 1.02     1.31     0.286     1 -13.3   32.6
#> 3 C        0.224          0.127  0.870    2.30     0.167     1 -11.7   29.4
#> 4 D        0.212          0.113  0.496    2.15     0.181     1  -6.07  18.1
#> 5 E        0.0116        -0.112  1.14     0.0942   0.767     1 -14.4   34.8
#> 6 F        0.00453       -0.120  0.906    0.0364   0.853     1 -12.1   30.2
#> # ... with 4 more variables: BIC <dbl>, deviance <dbl>, df.residual <int>,
#> #   nobs <int>

^{Created on 2020-11-15 by the reprex package (v0.3.0)}

system · December 6, 2020, 1:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.