I'm spinning my wheels on a nitty gritty r chunk that I'll attempt to illustrate with diamonds data set below. But in short, my problem is that I need previous predictions as input to new predictions but within a grouped data frame. So I have two complexities to deal with here. The fact I need something like predict(model, target ~ log(lag(previous prediction)))
and also the fact that using a cumulative sum within groups. So the lag is a within groups lag.
Some example code:
mydiamonds <- diamonds %>%
group_by(cut, color) %>%
mutate(rn = row_number()) %>%
arrange(cut, color, rn) %>%
mutate(CumPrice = cumsum(price))
mod.diamonds = glm(CumPrice ~ log(lag(CumPrice)) + cut + color, family = "poisson", data = mydiamonds)
With new data, I will not know what the CumPrice is except for the initial value at rn == 1. I want to predict it for each row where the previous row is an input to it. Again, this is within groups so I cannot apply the model across the raw df.
mydiamonds.test <- mydiamonds %>% select(-CumPrice)
Pretend that mydiamonds.test
is completely new hold out data that doesn't contain CumPrice which is both a target and a predictor (log(lag(CumPrice))
).
How could I predict onto mydiamonds.test?
[edit]
Added purrr tag since someone suggested purrr:accumulate() which I'm looking over just now