I've been working super hard on a dplyr problem. My goal is to compute a variable (column) based on the lagged values of such a column. The first value is given (in this example it's 100). The goal is to "imitate" the behavior of this spreadsheet:
value starts at 100 in cell D2.
cell D3 (i.e. next value) is calculated based on the value on D2...
cell D4 is calculated based on the value on D3, and so on...
Here's a reprex with an example tibble (my_tbl) and the tibble that I want to get (target_tbl). The last piece of code has my (failed) attempt:
library(tidyverse)
# --- A toy input ---
my_tbl <- tibble(
x = c(43.9375, 44.25, 44.3437),
y = c(0, 50, 0),
r = x / lag(x) - 1
)
# --- The output I wish to generate ---
target_tbl <- tibble(
x = c(43.9375, 44.25, 44.3437),
y = c(0, 50, 0),
r = x / lag(x) - 1,
value = c(100, 216.53, 216.99)
)
# --- Define n ---
n <- 2
# --- Attempt ---
my_attempt <- my_tbl %>%
rowwise() %>%
mutate(value = if_else(condition = is.na(lag(x)),
true = 100,
false = (lag(value) + y * n) * (1+r))
)
#> Error: Problem with `mutate()` input `value`.
#> x object 'value' not found
#> i Input `value` is `if_else(...)`.
#> i The error occurred in row 1.
Sorry @nirgrahamuk, I thought this solved the issue but it doesn't. The answer is missing adding values's previous value, which is the main question at hand.
Can you show with an example?
The second rows 200+ value would be 100 lower without the row before its 100.
its not explicitly the previous value, but it is implicitly as the previous value was constructed as an earlier cumsum with a term less.
Absolutely! Thank you for taking interest in solving this question.
Ok, so firstly, I had a mistake in the value of n, n should be n = 2.3. My apologies for that!
The data in the Excel file looks like this:
The output given by your answer @nirgrahamuk is the following:
I made a reprex with a for-loop that solves the problem. Hopefully, that's way more intuitive.
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.3
# --- define x, y, and r as vectors ---
x <- c(43.9375, 44.25, 44.3437) # x is given
y <- c(0, 50, 0) # y is given
r <- x / lag(x) - 1 # r depends on "x"
# --- Define n ---
n <- 2.3 # n is given
# --- create a numeric vector of length 3 to store values ---
value <- vector(mode = 'numeric', length = 3)
# --- For loop to get values ---
for (i in seq_along(value)) {
if (i == 1) {
# The first value should be 100
value[[i]] <- 100
}
if (i > 1) {
# Consequent values [i] depend on the lagged value [i-1]
value[[i]] <- (value[[i-1]] + (y[[i]]*n)) * (1+r[[i]])
}
}
# Print output
value
#> [1] 100.0000 216.5292 216.9877
The loop is a good option. Note, subsetting a vector is typically done with a single bracket [] instead of a double bracket [[]].
# --- create a numeric vector of length 3 to store values ---
value <- vector(mode = 'numeric', length = 3)
value[1L] = 100
# --- For loop to get values ---
for (i in seq_along(value)[-1L]) {
value [i] = (value[i-1L] + (y[i]*n)) * (1+r[i])
}
Alternatively, similar to how the StackOverflow question suggests Reduce (or even better, if you can reformulate to a non-recursive approach), I believe you could use accumulate which is similar to Reduce(..., accumulate = TRUE) although I am having difficulty formulating it with the additional r argument.
Thanks a lot Col! I will try using the accumulate function. It appears to be the case that this can't be done with dplyr and a loop is the way to go here?