Computing variable based on the lagged values of same variable using dplyr?

Hi, RStudio community!

I've been working super hard on a dplyr problem. My goal is to compute a variable (column) based on the lagged values of such a column. The first value is given (in this example it's 100). The goal is to "imitate" the behavior of this spreadsheet:

  • value starts at 100 in cell D2.
    image

  • cell D3 (i.e. next value) is calculated based on the value on D2...
    image

  • cell D4 is calculated based on the value on D3, and so on...
    image

Here's a reprex with an example tibble (my_tbl) and the tibble that I want to get (target_tbl). The last piece of code has my (failed) attempt:

library(tidyverse)

# --- A toy input ---
my_tbl <- tibble(
  x = c(43.9375, 44.25, 44.3437),
  y = c(0, 50, 0),
  r = x / lag(x) - 1
)

# --- The output I wish to generate ---
target_tbl <- tibble(
  x = c(43.9375, 44.25, 44.3437),
  y = c(0, 50, 0),
  r = x / lag(x) - 1,
  value = c(100, 216.53, 216.99)
)


# --- Define n ---
n <- 2

# --- Attempt ---
my_attempt <- my_tbl %>% 
  rowwise() %>% 
  mutate(value = if_else(condition = is.na(lag(x)),
                         true = 100, 
                         false = (lag(value) + y * n) * (1+r))
  )
#> Error: Problem with `mutate()` input `value`.
#> x object 'value' not found
#> i Input `value` is `if_else(...)`.
#> i The error occurred in row 1.

Created on 2020-10-25 by the reprex package (v0.3.0)

Is this one of those cases where I HAVE to do it with a for-loop? :frowning:

Thank you very much in advance!
Best,
Alexis

my_attempt <- my_tbl %>% 
  # rowwise() %>% 
  mutate(value = cumsum(if_else(condition = is.na(lag(x)),
                         true = 100, 
                         false = ( y * n) * (1+r)))
  )
1 Like

Sorry @nirgrahamuk, I thought this solved the issue but it doesn't. The answer is missing adding values's previous value, which is the main question at hand.

Nonetheless, I thank you for the answer!

Can you show with an example?
The second rows 200+ value would be 100 lower without the row before its 100.
its not explicitly the previous value, but it is implicitly as the previous value was constructed as an earlier cumsum with a term less.

Absolutely! Thank you for taking interest in solving this question.
Ok, so firstly, I had a mistake in the value of n, n should be n = 2.3. My apologies for that!

The data in the Excel file looks like this:
image

The output given by your answer @nirgrahamuk is the following:

image

I made a reprex with a for-loop that solves the problem. Hopefully, that's way more intuitive.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.3

# --- define x, y, and r as vectors --- 
x <- c(43.9375, 44.25, 44.3437)   # x is given
y <- c(0, 50, 0)                  # y is given
r <- x / lag(x) - 1               # r depends on "x"

# --- Define n ---
n <- 2.3        # n is given


# --- create a numeric vector of length 3 to store values ---
value <- vector(mode = 'numeric', length = 3)


# --- For loop to get values ---
for (i in seq_along(value)) {
  
  if (i == 1) {
    
    # The first value should be 100
    value[[i]] <- 100                    
  }
  
  if (i > 1) {
    
    # Consequent values [i] depend on the lagged value [i-1]
    value[[i]] <- (value[[i-1]] + (y[[i]]*n)) * (1+r[[i]])
    
  }
  
}

# Print output
value
#> [1] 100.0000 216.5292 216.9877

Created on 2020-10-26 by the reprex package (v0.3.0)

Here is a similar StackOverflow question:

The loop is a good option. Note, subsetting a vector is typically done with a single bracket [] instead of a double bracket [[]].

# --- create a numeric vector of length 3 to store values ---
value <- vector(mode = 'numeric', length = 3)
value[1L] = 100

# --- For loop to get values ---
for (i in seq_along(value)[-1L]) {
  value [i] = (value[i-1L] + (y[i]*n)) * (1+r[i])
}

Alternatively, similar to how the StackOverflow question suggests Reduce (or even better, if you can reformulate to a non-recursive approach), I believe you could use accumulate which is similar to Reduce(..., accumulate = TRUE) although I am having difficulty formulating it with the additional r argument.

2 Likes

Thanks a lot Col! I will try using the accumulate function. It appears to be the case that this can't be done with dplyr and a loop is the way to go here? :frowning: :cry: :sob:

Thank you all for your answers!!