Applying splines to groups within one column

Hi! I have a dataset that has a response value for each year for each “id”. I want to detrend the response variable (“response”) using 3 different splines (3yr, 5yr, 10yr). I’d like to end up with one value, per ID, per year, per spline, so that the final data would have the following columns: id, year, response, 3yr spline, 5yr spline, 10yr spline. Any help would be greatly appreciated! Simple example below of my data:

id<-c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C","C","C")

year<-c("2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010")

response<-c(12,11,10,14,9,8,10,7,4,5,8,4,2,6,9,8,10,7,4,5,15,11,12,14,5,8,3,7,2,5)

df<-data.frame(id,year,response)

Hi, I'm not good at spline, so I'm not sure if I've done correct in the regression. But I think the following code can be used as an example for you:

library(tidyverse)
library(splines)

df %>% nest(data = c(year,response)) %>% mutate(
    `3yr` = map(data, ~ lm(response ~ ns(year, knots = c(2002,2005,2008)),data = .x) %>% predict),
    `5yr` = map(data, ~ lm(response ~ ns(year, knots = c(2003,2007)),data = .x) %>% predict),
    `10yr` = map(data, ~ lm(response ~ ns(year, knots = c(2005)),data = .x) %>% predict)
  ) %>% unnest(c(data, `3yr`, `5yr`, `10yr`))
# A tibble: 30 x 6
   id    year  response `3yr` `5yr` `10yr`
   <chr> <chr>    <dbl> <dbl> <dbl>  <dbl>
 1 A     2001        12 11.7  11.5   11.7 
 2 A     2002        11 11.4  11.6   11.5 
 3 A     2003        10 11.2  11.4   11.3 
 4 A     2004        14 11.0  11.0   10.9 
 5 A     2005         9 10.5  10.3   10.3 
 6 A     2006         8  9.49  9.32   9.41
 7 A     2007        10  8.13  8.18   8.29
 8 A     2008         7  6.72  6.92   6.99
 9 A     2009         4  5.47  5.57   5.57
10 A     2010         5  4.34  4.19   4.09

The trick here is to adopt the nested-data feature from tidyr to do regression for each data group at once. relevant vignette can be found here.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.