# Computing variable based on the lagged values of same variable using dplyr?

Hi, RStudio community!

I've been working super hard on a `dplyr` problem. My goal is to compute a variable (column) based on the lagged values of such a column. The first value is given (in this example it's 100). The goal is to "imitate" the behavior of this spreadsheet:

• `value` starts at 100 in cell D2. • cell D3 (i.e. next value) is calculated based on the value on D2... • cell D4 is calculated based on the value on D3, and so on... Here's a reprex with an example tibble (`my_tbl`) and the `tibble` that I want to get (`target_tbl`). The last piece of code has my (failed) attempt:

``````library(tidyverse)

# --- A toy input ---
my_tbl <- tibble(
x = c(43.9375, 44.25, 44.3437),
y = c(0, 50, 0),
r = x / lag(x) - 1
)

# --- The output I wish to generate ---
target_tbl <- tibble(
x = c(43.9375, 44.25, 44.3437),
y = c(0, 50, 0),
r = x / lag(x) - 1,
value = c(100, 216.53, 216.99)
)

# --- Define n ---
n <- 2

# --- Attempt ---
my_attempt <- my_tbl %>%
rowwise() %>%
mutate(value = if_else(condition = is.na(lag(x)),
true = 100,
false = (lag(value) + y * n) * (1+r))
)
#> Error: Problem with `mutate()` input `value`.
#> i Input `value` is `if_else(...)`.
#> i The error occurred in row 1.
``````

Created on 2020-10-25 by the reprex package (v0.3.0)

Is this one of those cases where I HAVE to do it with a for-loop? Thank you very much in advance!
Best,
Alexis

``````my_attempt <- my_tbl %>%
# rowwise() %>%
mutate(value = cumsum(if_else(condition = is.na(lag(x)),
true = 100,
false = ( y * n) * (1+r)))
)``````
1 Like

Sorry @nirgrahamuk, I thought this solved the issue but it doesn't. The answer is missing adding `values`'s previous value, which is the main question at hand.

Nonetheless, I thank you for the answer!

Can you show with an example?
The second rows 200+ value would be 100 lower without the row before its 100.
its not explicitly the previous value, but it is implicitly as the previous value was constructed as an earlier cumsum with a term less.

Absolutely! Thank you for taking interest in solving this question.
Ok, so firstly, I had a mistake in the value of `n`, `n` should be `n = 2.3`. My apologies for that!

The data in the Excel file looks like this: The output given by your answer @nirgrahamuk is the following: I made a reprex with a for-loop that solves the problem. Hopefully, that's way more intuitive.

``````library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.3

# --- define x, y, and r as vectors ---
x <- c(43.9375, 44.25, 44.3437)   # x is given
y <- c(0, 50, 0)                  # y is given
r <- x / lag(x) - 1               # r depends on "x"

# --- Define n ---
n <- 2.3        # n is given

# --- create a numeric vector of length 3 to store values ---
value <- vector(mode = 'numeric', length = 3)

# --- For loop to get values ---
for (i in seq_along(value)) {

if (i == 1) {

# The first value should be 100
value[[i]] <- 100
}

if (i > 1) {

# Consequent values [i] depend on the lagged value [i-1]
value[[i]] <- (value[[i-1]] + (y[[i]]*n)) * (1+r[[i]])

}

}

# Print output
value
#>  100.0000 216.5292 216.9877
``````

Created on 2020-10-26 by the reprex package (v0.3.0)

Here is a similar StackOverflow question:

The loop is a good option. Note, subsetting a vector is typically done with a single bracket `[]` instead of a double bracket `[[]]`.

``````# --- create a numeric vector of length 3 to store values ---
value <- vector(mode = 'numeric', length = 3)
value[1L] = 100

# --- For loop to get values ---
for (i in seq_along(value)[-1L]) {
value [i] = (value[i-1L] + (y[i]*n)) * (1+r[i])
}
``````

Alternatively, similar to how the StackOverflow question suggests `Reduce` (or even better, if you can reformulate to a non-recursive approach), I believe you could use `accumulate` which is similar to `Reduce(..., accumulate = TRUE)` although I am having difficulty formulating it with the additional `r` argument.

2 Likes

Thanks a lot Col! I will try using the `accumulate` function. It appears to be the case that this can't be done with `dplyr` and a loop is the way to go here?   