row-wise iteration in a dataframe with interdependent variables

[Following on from a question I asked about row-wise iteration where the value of a variable at row n depends on its value at row n-1, asked and answered here]

What approach should I take to a recursive row-wise iteration problem where you have multiple columns you want to fill in, and they are interdependent on one another?

For example, say you have this tibble:

library(tidyverse)

x <- tibble(t = c(1:10),
            a = c(seq(100, 140, 10), rep(NA_real_, 5)),
            b = c(runif(5), rep(NA_real_, 5)),
            c = c(runif(5), rep(NA_real_, 5)))

We want to complete the a, b, and c columns. Where they already have a value, those values should remain. For the rows with NAs, we want to fill them in as follows:

a = lag(a, 1) * (1 + growth)
b = a * lag(b, 1)
c = b * lag(a, 2)

I have a feeling purrr::pmap() is the right approach for this type of multi-column problem, but I cannot get it to do what I want in this case.

Hi @mattcowgill,

I tried with pmap and other avenues but I don't think that a recursive computation as general as this can be achieved by pmap - I'd be happy to see a solution if that is possible. So, after thinking about the problem quite a bit, one (not easy) way is to encode the recursion in functions and then just simply mutate. Sorry for the poorly commented script, but can you please see if it works and does what you would like it to do. Mind you the b column (and c consequently) quickly explodes as it grows exponentially with the values in a which are quite large in this case.

Actually, I think you might be better off sticking to loops :slight_smile:

library(tidyverse)

x <- tibble(t = c(1:10),
                        a = c(seq(100, 140, 10), rep(NA_real_, 5)),
                        b = c(runif(5), rep(NA_real_, 5)),
                        c = c(runif(5), rep(NA_real_, 5)))

fill_lag <- function(vector, position, growth = 0.03) {
    
    if_else(!is.na(vector[position]), 
                    vector[position], 
                    (1+growth)^(sum(is.na(vector[1:position]))) * vector[sum(!is.na(vector[1:position]))]   )
    
    
}

fill_lag_mult <- function(vector, position, multiplier) {
    
    if_else(!is.na(vector[position]), 
                    vector[position], 
                    prod(multiplier[(sum(!is.na(vector[1:position]))+1):position]) * vector[sum(!is.na(vector[1:position]))]   )
    
    
}

x %>% mutate(
    a = purrr::map_dbl(t, function(n) {fill_lag(a, n)}),
    b = purrr::map_dbl(t, function(n) {fill_lag_mult(b, n, multiplier = a)}),
    c = if_else(!is.na(c), c, b * lag(a, 2))
)
#> # A tibble: 10 x 4
#>        t     a        b        c
#>    <int> <dbl>    <dbl>    <dbl>
#>  1     1  100  6.95e- 1 2.64e- 1
#>  2     2  110  3.00e- 1 7.70e- 2
#>  3     3  120  3.48e- 1 8.92e- 1
#>  4     4  130  8.15e- 1 6.18e- 1
#>  5     5  140  3.74e- 1 3.04e- 1
#>  6     6  144. 5.40e+ 1 7.02e+ 3
#>  7     7  149. 8.02e+ 3 1.12e+ 6
#>  8     8  153. 1.23e+ 6 1.77e+ 8
#>  9     9  158. 1.93e+ 8 2.87e+10
#> 10    10  162. 3.14e+10 4.80e+12

Created on 2019-09-02 by the reprex package (v0.3.0)

Thanks very much, Valeri. I think you're right - in this case a for loop might be easier!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.