Use lag within mutate to refer to previous rows mutation

I'd like to include values generated for previous rows as inputs to a mutate calculation. Some data:

mydiamonds <- diamonds %>%
  mutate(Ideal = ifelse(cut == 'Ideal', 1, 0)) %>% 
  group_by(Ideal) %>% 
  mutate(rn = row_number()) %>% 
  arrange(Ideal, rn) %>% 
  mutate(CumPrice = cumsum(price)) %>% 
  mutate(InitialPrice = min(price)) %>% 
  select(Ideal, rn, CumPrice, InitialPrice)

Looks like this:

mydiamonds %>% head
# A tibble: 6 x 4
# Groups:   Ideal [1]
  Ideal    rn CumPrice InitialPrice
  <dbl> <int>    <int>        <int>
1     0     1      326          326
2     0     2      653          326
3     0     3      987          326
4     0     4     1322          326
5     0     5     1658          326
6     0     6     1994          326

A model:

mod.diamonds = glm(CumPrice ~ log(lag(CumPrice)) +log(rn) + Ideal , family = "poisson", data = mydiamonds)

Test the model:

# new data, pretend we don't know CumPrice but want to use predictions to predict subsequent predictions
mydiamonds.testdata <- mydiamonds %>% select(-CumPrice)
# manual prediction based on lag(prediction), for the first row in each group use InitialPrice
## add coefficients as fields
coeffs <- mod.diamonds$coefficients
mydiamonds.testdata <- mydiamonds.testdata %>% 
  mutate(CoefIntercept = coeffs['(Intercept)'],
         CoefLogLagCumPrice = coeffs['log(lag(CumPrice))'],
         CoefLogRn = coeffs['log(rn)'],
         CoefIdeal = coeffs['Ideal']
         )

Here's how my test data look:

 mydiamonds.testdata %>% head
# A tibble: 6 x 7
# Groups:   Ideal [1]
  Ideal    rn InitialPrice CoefIntercept CoefLogLagCumPrice CoefLogRn CoefIdeal
  <dbl> <int>        <int>         <dbl>              <dbl>     <dbl>     <dbl>
1     0     1          326        0.0931              0.987    0.0154 -0.000715
2     0     2          326        0.0931              0.987    0.0154 -0.000715
3     0     3          326        0.0931              0.987    0.0154 -0.000715
4     0     4          326        0.0931              0.987    0.0154 -0.000715
5     0     5          326        0.0931              0.987    0.0154 -0.000715
6     0     6          326        0.0931              0.987    0.0154 -0.000715

Cannot use predict(), since I need to recursively predict where predictions for the previous day/row are input to the current day. Instead try manual prediction using the coefficents:

# prediction
mydiamonds.testdata <- mydiamonds.testdata %>% 
  mutate(
    Prediction = CoefIntercept + 
      
      # here's the hard bit. If it's the first row in the group, use InitialPrice, else use the value of the previous prediction
      (CoefLogLagCumPrice * ifelse(rn == 1, InitialPrice, lag(Prediction))) + 
      
      (CoefLogRn * log(rn)) + 
      (CoefIdeal * Ideal)
    )

Error: Problem with mutate() input Prediction. x object
'Prediction' not found :information_source: Input Prediction is +.... :information_source: The error
occurred in group 1: Ideal = 0.

How can I mutate in this way, where I'd like to refer to the previous rows mutation? (Unless it's the very first row, in which case use InitialPrice)

A suggestion


#make lagCumPrice in the input
(mydiamonds2 <- mydiamonds %>% ungroup %>% 
  mutate(lagCumPrice=if_else(rn==1,InitialPrice,lag(CumPrice))))

mod.diamonds = glm(CumPrice ~
                     #use it
                     log(lagCumPrice) +log(rn)
                   + Ideal ,
                   family = "poisson", data = mydiamonds2)

(mydiamonds3 <- mydiamonds2 %>% mutate(raw_pred = predict(mod.diamonds,newdata=.),
                       exp_pred = exp(raw_pred)) )

Hi thanks for the suggestion. The issue is though that CumPrice for the previous row is a prediction not an input. On training data it's there but I want to test the model. So as part of the test the CumPrice on the first row is the InitialPrice. On subsequent rows it's the prediction from the previous row :confused:

So the solution won't work if I'm reading it correctly.

I gave it a shot with accumulate, a function I'm less familiar with. Maybe someone can see how I'm using it wrong or if I can even use accumulate in this way?

mydiamonds.testdata <- mydiamonds.testdata %>% 
  mutate(
    Prediction = accumulate(.f = function(.) {
      
    .$CoefIntercept + 
      
      # here's the hard bit. If it's the first row in the group, use InitialPrice, else use the value of the previous prediction
      (.$CoefLogLagCumPrice * ifelse(.$rn == 1, .$InitialPrice, lag(.$Prediction))) + 
      
      (.$CoefLogRn * log(.$rn)) + 
      (.$CoefIdeal * .$Ideal)
      
      }))

Results in error:
< Error: Problem with mutate() input Prediction.
x argument ".x" is missing, with no default
:information_source: Input Prediction is accumulate(...).
:information_source: The error occurred in group 1: Ideal = 0.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.