row-wise iteration in a dataframe with interdependent variables

mattcowgill · September 1, 2019, 11:00pm

[Following on from a question I asked about row-wise iteration where the value of a variable at row n depends on its value at row n-1, asked and answered here]

row-wise iteration in a dataframe where each row depends on previous values tidyverse

Hi, I would like advice about the best way to approach row-wise iteration where the value of a variable at row n is dependent on the value at row n-1. I would prefer solutions that use dplyr and/or purrr. I've struggled with this for years, getting by on a mix of for loops, purrrlyr::by_row(), sometimes purrr::pmap, etc. but I've never quite felt I've settled on a good solution. Here's an example that returns the desired output, in a horrible clunky for-loop-y way. library(tidyverse) x <- tibble(a = c(1:10), b = c(seq(100, 140, 10), rep(NA_real_, 5)) ) fill_in <- function(x, growth = 0.03) { x <- if_else(!is.na(x), x, lag(x, 1) * (1 + growth)) x } for(i in 1:nrow(x)) …

What approach should I take to a recursive row-wise iteration problem where you have multiple columns you want to fill in, and they are interdependent on one another?

For example, say you have this tibble:

library(tidyverse)

x <- tibble(t = c(1:10),
            a = c(seq(100, 140, 10), rep(NA_real_, 5)),
            b = c(runif(5), rep(NA_real_, 5)),
            c = c(runif(5), rep(NA_real_, 5)))

We want to complete the a, b, and c columns. Where they already have a value, those values should remain. For the rows with NAs, we want to fill them in as follows:

a = lag(a, 1) * (1 + growth)
b = a * lag(b, 1)
c = b * lag(a, 2)

I have a feeling purrr::pmap() is the right approach for this type of multi-column problem, but I cannot get it to do what I want in this case.

valeri · September 2, 2019, 8:26pm

Hi @mattcowgill,

I tried with pmap and other avenues but I don't think that a recursive computation as general as this can be achieved by pmap - I'd be happy to see a solution if that is possible. So, after thinking about the problem quite a bit, one (not easy) way is to encode the recursion in functions and then just simply mutate. Sorry for the poorly commented script, but can you please see if it works and does what you would like it to do. Mind you the b column (and c consequently) quickly explodes as it grows exponentially with the values in a which are quite large in this case.

Actually, I think you might be better off sticking to loops

library(tidyverse)

x <- tibble(t = c(1:10),
                        a = c(seq(100, 140, 10), rep(NA_real_, 5)),
                        b = c(runif(5), rep(NA_real_, 5)),
                        c = c(runif(5), rep(NA_real_, 5)))

fill_lag <- function(vector, position, growth = 0.03) {
    
    if_else(!is.na(vector[position]), 
                    vector[position], 
                    (1+growth)^(sum(is.na(vector[1:position]))) * vector[sum(!is.na(vector[1:position]))]   )
    
    
}

fill_lag_mult <- function(vector, position, multiplier) {
    
    if_else(!is.na(vector[position]), 
                    vector[position], 
                    prod(multiplier[(sum(!is.na(vector[1:position]))+1):position]) * vector[sum(!is.na(vector[1:position]))]   )
    
    
}

x %>% mutate(
    a = purrr::map_dbl(t, function(n) {fill_lag(a, n)}),
    b = purrr::map_dbl(t, function(n) {fill_lag_mult(b, n, multiplier = a)}),
    c = if_else(!is.na(c), c, b * lag(a, 2))
)
#> # A tibble: 10 x 4
#>        t     a        b        c
#>    <int> <dbl>    <dbl>    <dbl>
#>  1     1  100  6.95e- 1 2.64e- 1
#>  2     2  110  3.00e- 1 7.70e- 2
#>  3     3  120  3.48e- 1 8.92e- 1
#>  4     4  130  8.15e- 1 6.18e- 1
#>  5     5  140  3.74e- 1 3.04e- 1
#>  6     6  144. 5.40e+ 1 7.02e+ 3
#>  7     7  149. 8.02e+ 3 1.12e+ 6
#>  8     8  153. 1.23e+ 6 1.77e+ 8
#>  9     9  158. 1.93e+ 8 2.87e+10
#> 10    10  162. 3.14e+10 4.80e+12

Created on 2019-09-02 by the reprex package (v0.3.0)

mattcowgill · September 2, 2019, 10:05pm

Thanks very much, Valeri. I think you're right - in this case a for loop might be easier!

system · September 23, 2019, 10:05pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.