row-wise iteration in a dataframe where each row depends on previous values

Hi,

I would like advice about the best way to approach row-wise iteration where the value of a variable at row n is dependent on the value at row n-1. I would prefer solutions that use dplyr and/or purrr. I've struggled with this for years, getting by on a mix of for loops, purrrlyr::by_row(), sometimes purrr::pmap, etc. but I've never quite felt I've settled on a good solution.

Here's an example that returns the desired output, in a horrible clunky for-loop-y way.


library(tidyverse)

x <- tibble(a = c(1:10),
            b = c(seq(100, 140, 10), rep(NA_real_, 5)) )

fill_in <- function(x, growth = 0.03) {
  x <- if_else(!is.na(x), x, lag(x, 1) * (1 + growth))
  x
}


for(i in 1:nrow(x)) {
  x <- x %>%
  mutate(b = fill_in(b))
}


What is the best way to do this? I've read the discussion here and here and have read Jenny Bryan's row-wise slide deck and Winston Chang's blog post and am still not clear. Thank you.

2 Likes

I would use purrr::accumulate in this case of a lag of one. It just require a small modification in your function to take two arguments : previous value and actual value.
See

library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

x <- tibble(a = c(1:10),
            b = c(seq(100, 140, 10), rep(NA_real_, 5)) )

fill_in <- function(prev, new, growth = 0.03) {
  if_else(!is.na(new), new, prev * (1 + growth))
}

options(pillar.sigfig = 5)
x %>%
  mutate(b = accumulate(b, fill_in))
#> # A tibble: 10 x 2
#>        a      b
#>    <int>  <dbl>
#>  1     1 100   
#>  2     2 110   
#>  3     3 120   
#>  4     4 130   
#>  5     5 140   
#>  6     6 144.2 
#>  7     7 148.53
#>  8     8 152.98
#>  9     9 157.57
#> 10    10 162.30

Created on 2019-08-31 by the reprex package (v0.3.0)

7 Likes

Hi Christophe,

Thanks so much for that, that's a really elegant solution. I had looked at accumulate() before but couldn't figure out how to apply it to this type of problem.

I realise now that my reprex is perhaps overly simplified and doesn't capture what it is that I'm trying to do...

I've posted a follow-up here.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.