I would like advice about the best way to approach row-wise iteration where the value of a variable at row n is dependent on the value at row n-1. I would prefer solutions that use dplyr and/or purrr. I've struggled with this for years, getting by on a mix of for loops, purrrlyr::by_row(), sometimes purrr::pmap, etc. but I've never quite felt I've settled on a good solution.
Here's an example that returns the desired output, in a horrible clunky for-loop-y way.
library(tidyverse)
x <- tibble(a = c(1:10),
b = c(seq(100, 140, 10), rep(NA_real_, 5)) )
fill_in <- function(x, growth = 0.03) {
x <- if_else(!is.na(x), x, lag(x, 1) * (1 + growth))
x
}
for(i in 1:nrow(x)) {
x <- x %>%
mutate(b = fill_in(b))
}
What is the best way to do this? I've read the discussion here and here and have read Jenny Bryan's row-wise slide deck and Winston Chang's blog post and am still not clear. Thank you.
I would use purrr::accumulate in this case of a lag of one. It just require a small modification in your function to take two arguments : previous value and actual value.
See
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
x <- tibble(a = c(1:10),
b = c(seq(100, 140, 10), rep(NA_real_, 5)) )
fill_in <- function(prev, new, growth = 0.03) {
if_else(!is.na(new), new, prev * (1 + growth))
}
options(pillar.sigfig = 5)
x %>%
mutate(b = accumulate(b, fill_in))
#> # A tibble: 10 x 2
#> a b
#> <int> <dbl>
#> 1 1 100
#> 2 2 110
#> 3 3 120
#> 4 4 130
#> 5 5 140
#> 6 6 144.2
#> 7 7 148.53
#> 8 8 152.98
#> 9 9 157.57
#> 10 10 162.30
Thanks so much for that, that's a really elegant solution. I had looked at accumulate() before but couldn't figure out how to apply it to this type of problem.
I realise now that my reprex is perhaps overly simplified and doesn't capture what it is that I'm trying to do...