Linear regression model for a timeseries per gridcell in R

ess · June 1, 2021, 6:33pm

I have a dataframe that contains coordinates describing each one gridcell, year indications and several variables which I want to include in my linear regression. I want to perform a simple linear regression for a timeseries of 30 years per gridcell (x,y coordinate pair). So far I grouped the dataset by x and y and nested all other variables so that each row contains one column each with x and y coordinate and one column containing a data rame with the independent and predictor variables and one variable indicating the year

first I load the respective dataframe "df" as rds

then I group the dataset and nest all values per coordinate pair

model<-df %>% 
  mutate_at('year',as.numeric) %>%  
  dplyr::select(-year) %>% 
  group_by(x,y) %>% 
  nest() %>%
  lm(indep_var~.,data =df) #this is where it does not work.

I also tried lm(indep_var~.,data =df$data) since the column where all variables are nested in within df is called data but this does not work either. The first option gives the error Error in model.frame.default(formula = ., data = df, subset = indep_var ~ : invalid type (closure) for the variable 'data' . The second option gives the error Error in eval(predvars, data, env) : object 'x' not found

arthur.t · June 1, 2021, 7:02pm

Hard to be sure because I'm not familiar with your data. But one thing I notice is that the pipe to lm will use that data frame as the first argument to lm. The first argument to lm should be a formula and the second should be the data.

If I understand what you're trying to do, this will be useful:

library(tidyverse)
library(broom)

byspecies <- iris %>%
  group_by(Species) %>%
  summarize(model = list(lm(Sepal.Length ~ Sepal.Width, data = across()))) %>%
  mutate(coef = map(model, tidy))

byspecies %>% select(Species, coef) %>% unnest(coef) %>% print()
#> # A tibble: 6 x 6
#>   Species    term        estimate std.error statistic  p.value
#>   <fct>      <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 setosa     (Intercept)    2.64     0.310       8.51 3.74e-11
#> 2 setosa     Sepal.Width    0.690    0.0899      7.68 6.71e-10
#> 3 versicolor (Intercept)    3.54     0.563       6.29 9.07e- 8
#> 4 versicolor Sepal.Width    0.865    0.202       4.28 8.77e- 5
#> 5 virginica  (Intercept)    3.91     0.757       5.16 4.66e- 6
#> 6 virginica  Sepal.Width    0.902    0.253       3.56 8.43e- 4

byspecies %>% with(model %>% setNames(Species)) %>% map(summary) %>% print()
#> $setosa
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width, data = across())
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.52476 -0.16286  0.02166  0.13833  0.44428 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   2.6390     0.3100   8.513 3.74e-11 ***
#> Sepal.Width   0.6905     0.0899   7.681 6.71e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.2385 on 48 degrees of freedom
#> Multiple R-squared:  0.5514, Adjusted R-squared:  0.542 
#> F-statistic: 58.99 on 1 and 48 DF,  p-value: 6.71e-10
#> 
#> 
#> $versicolor
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width, data = across())
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.73497 -0.28556 -0.07544  0.43666  0.83805 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   3.5397     0.5629   6.289 9.07e-08 ***
#> Sepal.Width   0.8651     0.2019   4.284 8.77e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.4436 on 48 degrees of freedom
#> Multiple R-squared:  0.2766, Adjusted R-squared:  0.2615 
#> F-statistic: 18.35 on 1 and 48 DF,  p-value: 8.772e-05
#> 
#> 
#> $virginica
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width, data = across())
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.26067 -0.36921 -0.03606  0.19841  1.44917 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   3.9068     0.7571   5.161 4.66e-06 ***
#> Sepal.Width   0.9015     0.2531   3.562 0.000843 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.5714 on 48 degrees of freedom
#> Multiple R-squared:  0.2091, Adjusted R-squared:  0.1926 
#> F-statistic: 12.69 on 1 and 48 DF,  p-value: 0.0008435

^{Created on 2021-06-01 by the reprex package (v1.0.0)}

ess · June 1, 2021, 7:38pm

thanks, it is helping for sure. I think I almost got it implemented but what is tidy in the very last row of your code?

arthur.t · June 1, 2021, 8:48pm

Sure. tidy is a function from the broom package that's a popular way to get data frames of coefficients. It's compatible with a variety of model types. coef() will give a named vector of coefficients and so tidy is a convenient alternative. When we map(model, tidy) we loop over the models we made and make a data frame of each. Then when we byspecies %>% select(Species, coef) %>% unnest(coef) %>% print() we print the data frame that includes the coefficients across the various models.

EconProf · June 1, 2021, 9:16pm

I have been using dplyr's new nest_by( ) function:

library(tidyverse)
library(broom)

byspecies <- iris %>%
  nest_by(Species) %>%    # a tibble with grouped data in a <list<tibble>> named "data"
  mutate(model = list(lm(Sepal.Length ~ Sepal.Width, data = data))) 

byspecies %>% 
  summarise(tidy(model)) %>%
  ungroup()
#> # A tibble: 6 x 6
#>   Species    term        estimate std.error statistic  p.value
#>   <fct>      <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 setosa     (Intercept)    2.64     0.310       8.51 3.74e-11
#> 2 setosa     Sepal.Width    0.690    0.0899      7.68 6.71e-10
#> 3 versicolor (Intercept)    3.54     0.563       6.29 9.07e- 8
#> 4 versicolor Sepal.Width    0.865    0.202       4.28 8.77e- 5
#> 5 virginica  (Intercept)    3.91     0.757       5.16 4.66e- 6
#> 6 virginica  Sepal.Width    0.902    0.253       3.56 8.43e- 4

byspecies %>% 
  with(model %>% setNames(Species)) %>% 
  map(summary)
#> $setosa
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width, data = data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.52476 -0.16286  0.02166  0.13833  0.44428 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   2.6390     0.3100   8.513 3.74e-11 ***
#> Sepal.Width   0.6905     0.0899   7.681 6.71e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.2385 on 48 degrees of freedom
#> Multiple R-squared:  0.5514, Adjusted R-squared:  0.542 
#> F-statistic: 58.99 on 1 and 48 DF,  p-value: 6.71e-10
#> 
#> 
#> $versicolor
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width, data = data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.73497 -0.28556 -0.07544  0.43666  0.83805 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   3.5397     0.5629   6.289 9.07e-08 ***
#> Sepal.Width   0.8651     0.2019   4.284 8.77e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.4436 on 48 degrees of freedom
#> Multiple R-squared:  0.2766, Adjusted R-squared:  0.2615 
#> F-statistic: 18.35 on 1 and 48 DF,  p-value: 8.772e-05
#> 
#> 
#> $virginica
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width, data = data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.26067 -0.36921 -0.03606  0.19841  1.44917 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   3.9068     0.7571   5.161 4.66e-06 ***
#> Sepal.Width   0.9015     0.2531   3.562 0.000843 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.5714 on 48 degrees of freedom
#> Multiple R-squared:  0.2091, Adjusted R-squared:  0.1926 
#> F-statistic: 12.69 on 1 and 48 DF,  p-value: 0.0008435

^{Created on 2021-06-01 by the reprex package (v2.0.0)}

system · June 22, 2021, 9:17pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.