Filling a tibble rowwise without a for loop

sgranitz · January 19, 2018, 9:49pm

I have the following tibble which can be n rows long dependent on column A.

A is known.
B,C,D starting values are known.
x,y,z are constants.

x <- 0.01
y <- 0.015
z <- 0.03

A <- runif(5, 0, 2)
B <- c(1, rep(NA, 4))
C <- c(1, rep(NA, 4))
D <- c(B[1] * x + C[1] * y, rep(NA, 4))

df <- tibble(A,B,C,D)

> df
           A  B  C     D
1 1.43072347  1  1 0.025
2 1.19097936 NA NA    NA
3 0.07721345 NA NA    NA
4 0.19607427 NA NA    NA
5 1.69851615 NA NA    NA

Each B[i] is B[i-1] * A[i-1] - D[i-1]
Each C[i] is max(C[i-1] + z, B[i])
Each D[i] is B[i] * x + C[i] * y

This is an easy function to write as:

f <- function(data, x, y, z) {
  B <- data$A * data$B - data$D
  C <- max(data$C + z, B)
  D <- B * x + C * y
  c(B,C,D)
}

My question is how to apply the function where data is df[(i-1), ] to fill out my tibble of n rows without using a for loop.

for (i in 2:nrow(df)) df[i, 2:4] <- f(df[(i-1), ], x, y, z)

> df
           A            B        C          D
1 1.43072347  1.000000000 1.000000 0.02500000
2 1.19097936  1.405723470 1.405723 0.03514309
3 0.07721345  1.639044546 1.639045 0.04097611
4 0.19607427  0.085580171 1.669045 0.02589147
5 1.69851615 -0.009111401 1.699045 0.02539455

To build further on this, there are J sets of A, each with its own B,C,D.

I would like to map a function over a tibble of A[1:n, 1:j] in an efficient manner to calculate B[2:n, 1:j], C[2:n, 1:j], D[2:n, 1:j]. I am guessing the answer would be related to purr but I haven't found a good solution yet. Would appreciate if anyone could point me in the right direction or share any ideas.

cderv · January 20, 2018, 10:27am

This is one solution I think of. Maybe not the best but seems clean to me.Your equation makes me think of recursion.

First we are working on a vector A, then we will see how this could be apply thanks to purrr to a largeA compose of J vectors of size N.
Inputs should be a vector A, and parameters x, y, z. For first element we have an initialization. (your fist data row in your data.frame).
Then we apply a function recursively until the last element of A (B[5] needs A[4], but A[5] is not useful). During the recursion we save
each results as a on row tibble. At the end of the recursion we binds those tibbles and add the A vector to get results as a data.frame.

The code for what I just explained:

# Parameters
x <- 0.01
y <- 0.015
z <- 0.03
# for reproducibility
set.seed(42)

# We define A as a vector
A <- runif(5, 0, 2)

# Main function to generate the result
main <- function(A, x, y, z) {
  # initialization function
  init <- function(x, y, z) {
    B <- 1
    C <- 1
    D <- B * x + C * y
    tibble::data_frame(B = B, C = B, D = D)
  }
  # initialization of B, C, D
  init <- init(x, y, z)
  # create resulting object
  res <- list(init)
  # recursion on vector A
  g <- function(A, B, C, D, x, y, z) {
    # compute B, C, D
    B <- A[1] * B - D
    C <- max(C + z, B)
    D <- B * x + C * y
    # update res object in previous environnement
    res <<- append(res, list(tibble::tibble(B= B, C = C, D = D)))
    # reduce vector A by 1 element
    A <- A[-1]
    # stop at last element of A
    if (length(A)==1) return(NULL)
    # if not last element continue
    g(A,B,C,D, x, y, z)
  }
  # apply recursion function
  g(A, init$B, init$C, init$D, x, y, z)
  # get back the result as a tibble
  tibble::add_column(dplyr::bind_rows(res), A, .before = 1)
}

# lets get our result! 
main(A, x, y, z)
#> # A tibble: 5 x 4
#>       A     B     C      D
#>   <dbl> <dbl> <dbl>  <dbl>
#> 1 1.83   1.00  1.00 0.0250
#> 2 1.87   1.80  1.80 0.0451
#> 3 0.572  3.34  3.34 0.0834
#> 4 1.66   1.83  3.37 0.0688
#> 5 1.28   2.96  3.40 0.0806

It works for a vector A.
Now if we have J set of vector A of size N, we can use purrr to apply on each set.

# parameters
n <- 5
j <- 3
set.seed(42)
# we need purrr
library(purrr)
# create a list of j vector of size n
rerun(j, runif(n, 0, 2)) %>%
  # name those set for next step
  set_names(nm = paste0("set_", seq_along(.))) %>%
  # apply main function on each vectros and get back a one tibble with a set_id colum
  map_df(~ main(.x, x = x, y = y, z = z), .id = "set_id")
#> # A tibble: 15 x 5
#>    set_id     A     B     C      D
#>    <chr>  <dbl> <dbl> <dbl>  <dbl>
#>  1 set_1  1.83  1.00   1.00 0.0250
#>  2 set_1  1.87  1.80   1.80 0.0451
#>  3 set_1  0.572 3.34   3.34 0.0834
#>  4 set_1  1.66  1.83   3.37 0.0688
#>  5 set_1  1.28  2.96   3.40 0.0806
#>  6 set_2  1.04  1.00   1.00 0.0250
#>  7 set_2  1.47  1.01   1.03 0.0256
#>  8 set_2  0.269 1.47   1.47 0.0367
#>  9 set_2  1.31  0.358  1.50 0.0260
#> 10 set_2  1.41  0.445  1.53 0.0274
#> 11 set_3  0.915 1.00   1.00 0.0250
#> 12 set_3  1.44  0.890  1.03 0.0244
#> 13 set_3  1.87  1.26   1.26 0.0314
#> 14 set_3  0.511 2.32   2.32 0.0579
#> 15 set_3  0.925 1.13   2.35 0.0465

we get the result we are looking for as one table. I you prefer several list's elements.

set.seed(42)
rerun(j, runif(n, 0, 2)) %>%
  map(~ main(.x, x = x, y = y, z = z)) %>%
  str(2)
#> List of 3
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    5 obs. of  4 variables:
#>   ..$ A: num [1:5] 1.83 1.874 0.572 1.661 1.283
#>   ..$ B: num [1:5] 1 1.8 3.34 1.83 2.96
#>   ..$ C: num [1:5] 1 1.8 3.34 3.37 3.4
#>   ..$ D: num [1:5] 0.025 0.0451 0.0834 0.0688 0.0806
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    5 obs. of  4 variables:
#>   ..$ A: num [1:5] 1.038 1.473 0.269 1.314 1.41
#>   ..$ B: num [1:5] 1 1.013 1.467 0.358 0.445
#>   ..$ C: num [1:5] 1 1.03 1.47 1.5 1.53
#>   ..$ D: num [1:5] 0.025 0.0256 0.0367 0.026 0.0274
#>  $ :Classes 'tbl_df', 'tbl' and 'data.frame':    5 obs. of  4 variables:
#>   ..$ A: num [1:5] 0.915 1.438 1.869 0.511 0.925
#>   ..$ B: num [1:5] 1 0.89 1.26 2.32 1.13
#>   ..$ C: num [1:5] 1 1.03 1.26 2.32 2.35
#>   ..$ D: num [1:5] 0.025 0.0244 0.0314 0.0579 0.0465

Hope it helps!

Created on 2018-01-20 by the reprex package (v0.1.1.9000).

sgranitz · January 22, 2018, 12:59pm

Thanks @cderv I will test out this approach. Appreciate the insight.

markdly · February 1, 2018, 4:34am

I realise this is a really belated response but I thought it might still be worth sharing an example of using purrr::accumulate in this context. (I've used the same seed as cderv for comparison / reproducibility)

library(tidyverse)
set.seed(42)
x <- 0.01
y <- 0.015
z <- 0.03

df <- tibble(
  A = runif(5, 0, 2),
  B = c(1, rep(NA, 4)),
  C = c(1, rep(NA, 4)),
  D = c(B[1] * x + C[1] * y, rep(NA, 4))
)

foo <- transpose(df) %>% map( ~ as_tibble(.x))

f <- function(j, k, x, y, z) {
  tibble(
    A = k$A, 
    B = j$B * j$A - j$D, 
    C = max(j$C + z, B),
    D = B * x + C * y
  )
}

accumulate(foo, f, x, y, z) %>% map_df(~ .x)
#> # A tibble: 5 x 4
#>           A        B        C          D
#>       <dbl>    <dbl>    <dbl>      <dbl>
#> 1 1.8296121 1.000000 1.000000 0.02500000
#> 2 1.8741508 1.804612 1.804612 0.04511530
#> 3 0.5722791 3.337000 3.337000 0.08342500
#> 4 1.6608953 1.826270 3.367000 0.06876770
#> 5 1.2834910 2.964476 3.397000 0.08059976

# If you have many dataframes stored in a list then you could possibly do something like
multi_df <- list(df, df, df)

fs <- function(df) {
  df %>% 
    transpose() %>% 
    map( ~ as_tibble(.x)) %>%
    accumulate(f, x, y, z) %>% map_df(~ .x)
}

multi_df <- multi_df %>% map(fs)

sgranitz · February 21, 2018, 4:18pm

Thank you Mark, I will look more into accumulate!