 # row-wise iteration in a dataframe with interdependent variables

[Following on from a question I asked about row-wise iteration where the value of a variable at row n depends on its value at row n-1, asked and answered here]

What approach should I take to a recursive row-wise iteration problem where you have multiple columns you want to fill in, and they are interdependent on one another?

For example, say you have this tibble:

``````library(tidyverse)

x <- tibble(t = c(1:10),
a = c(seq(100, 140, 10), rep(NA_real_, 5)),
b = c(runif(5), rep(NA_real_, 5)),
c = c(runif(5), rep(NA_real_, 5)))
``````

We want to complete the a, b, and c columns. Where they already have a value, those values should remain. For the rows with NAs, we want to fill them in as follows:

a = lag(a, 1) * (1 + growth)
b = a * lag(b, 1)
c = b * lag(a, 2)

I have a feeling `purrr::pmap()` is the right approach for this type of multi-column problem, but I cannot get it to do what I want in this case.

Hi @mattcowgill,

I tried with `pmap` and other avenues but I don't think that a recursive computation as general as this can be achieved by `pmap` - I'd be happy to see a solution if that is possible. So, after thinking about the problem quite a bit, one (not easy) way is to encode the recursion in functions and then just simply `mutate`. Sorry for the poorly commented script, but can you please see if it works and does what you would like it to do. Mind you the `b` column (and `c` consequently) quickly explodes as it grows exponentially with the values in `a` which are quite large in this case.

Actually, I think you might be better off sticking to loops ``````library(tidyverse)

x <- tibble(t = c(1:10),
a = c(seq(100, 140, 10), rep(NA_real_, 5)),
b = c(runif(5), rep(NA_real_, 5)),
c = c(runif(5), rep(NA_real_, 5)))

fill_lag <- function(vector, position, growth = 0.03) {

if_else(!is.na(vector[position]),
vector[position],
(1+growth)^(sum(is.na(vector[1:position]))) * vector[sum(!is.na(vector[1:position]))]   )

}

fill_lag_mult <- function(vector, position, multiplier) {

if_else(!is.na(vector[position]),
vector[position],
prod(multiplier[(sum(!is.na(vector[1:position]))+1):position]) * vector[sum(!is.na(vector[1:position]))]   )

}

x %>% mutate(
a = purrr::map_dbl(t, function(n) {fill_lag(a, n)}),
b = purrr::map_dbl(t, function(n) {fill_lag_mult(b, n, multiplier = a)}),
c = if_else(!is.na(c), c, b * lag(a, 2))
)
#> # A tibble: 10 x 4
#>        t     a        b        c
#>    <int> <dbl>    <dbl>    <dbl>
#>  1     1  100  6.95e- 1 2.64e- 1
#>  2     2  110  3.00e- 1 7.70e- 2
#>  3     3  120  3.48e- 1 8.92e- 1
#>  4     4  130  8.15e- 1 6.18e- 1
#>  5     5  140  3.74e- 1 3.04e- 1
#>  6     6  144. 5.40e+ 1 7.02e+ 3
#>  7     7  149. 8.02e+ 3 1.12e+ 6
#>  8     8  153. 1.23e+ 6 1.77e+ 8
#>  9     9  158. 1.93e+ 8 2.87e+10
#> 10    10  162. 3.14e+10 4.80e+12
``````

Created on 2019-09-02 by the reprex package (v0.3.0)

Thanks very much, Valeri. I think you're right - in this case a for loop might be easier!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.