Coefficient of determination for each month

I have two datasets as in the link.
I am trying to calculate the coefficient of determination for each month i.e. jan-dec for Tmax.
How could I do it?
Thanks

df=read_csv("df.csv",col_names = TRUE)
head(df)
# A tibble: 6 x 4
  datetime             Tmax month  year
  <dttm>              <dbl> <dbl> <dbl>
1 1976-09-24 15:00:00 20.3      9  1976
2 1976-10-06 09:15:00 15.2     10  1976
3 1976-11-27 16:00:00 17.8     11  1976
4 1976-12-06 15:00:00  2.54    12  1976
5 1977-01-09 20:45:00  7.62     1  1977
6 1977-02-24 00:30:00 20.3      2  1977
df1=read_csv("df1.csv",col_names = TRUE)
head(df1)
# A tibble: 6 x 4
  DateTime             Tmax month  year
  <dttm>              <dbl> <dbl> <dbl>
1 1976-09-24 15:45:00  17.2     9  1976
2 1976-10-06 09:45:00  11.0    10  1976
3 1976-11-28 11:00:00  15      11  1976
4 1976-12-06 15:00:00   4      12  1976
5 1977-01-14 08:00:00   9       1  1977
6 1977-02-24 00:15:00  12       2  1977

This is more of a methodological problem than a coding problem. There are at least two issues.

  1. Six points may not be very informative
  2. Sequentially ordered observations * may*present autocorrelation issues, which violates an assumption in obtaining R^2
x <- seq(1:6)
y <- c(17.2,11.0,15,4,9,12)
plot(x,y)

fit <- lm(x ~ y)
summary(fit)
#> 
#> Call:
#> lm(formula = x ~ y)
#> 
#> Residuals:
#>       1       2       3       4       5       6 
#> -1.3348 -1.5732  0.2258 -0.9715  1.0273  2.6265 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)   5.7705     2.1244   2.716   0.0532 .
#> y            -0.1998     0.1751  -1.141   0.3177  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.817 on 4 degrees of freedom
#> Multiple R-squared:  0.2454, Adjusted R-squared:  0.05676 
#> F-statistic: 1.301 on 1 and 4 DF,  p-value: 0.3177
par(mfrow = c(2,2))
plot(fit)

gvlma::gvlma(fit)
#> 
#> Call:
#> lm(formula = x ~ y)
#> 
#> Coefficients:
#> (Intercept)            y  
#>      5.7705      -0.1998  
#> 
#> 
#> ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
#> USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
#> Level of Significance =  0.05 
#> 
#> Call:
#>  gvlma::gvlma(x = fit) 
#> 
#>                     Value p-value                Decision
#> Global Stat        3.0222  0.5541 Assumptions acceptable.
#> Skewness           0.3768  0.5393 Assumptions acceptable.
#> Kurtosis           0.2369  0.6264 Assumptions acceptable.
#> Link Function      1.6805  0.1949 Assumptions acceptable.
#> Heteroscedasticity 0.7279  0.3936 Assumptions acceptable.

Created on 2020-09-29 by the reprex package (v0.3.0.9001)

2 Likes

Thanks for replying @technocrat .
Actually, I was looking to estimate R^2 for Tmax between two data frames df and df1 with each month.
The header was inserted so as to show the input data. The actual, file was inserted through the link.
Next time, I will try to be more clear.

If Tmax in df1 and df2 are to be modeled, just

lm(mtcars$mpg ~ mtcars$drat)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.