Loop Across Column Then By Row (I think?)


#1

Hello,

I'm working on a simulation of sorts and need to populate a data frame over a certain period. There are predefined probabilities which determine which values are computed in each cell based on the corresponding values in the previous row. I have a spreadsheet example where I have done this successfully and am now trying to implement in R.

My starting point would look something like this:

mydata <- tibble(
  year = c(0, seq(1:60)),
  event = c(0, rep(0.05, 12), rep(0.10, 12), 
            rep(0.15, 12), rep(0.20, 12), rep(0.25, 12)),
  A = c(1000, rep(0, 60)),
  B = c(rep(0, 61)),
  C = c(rep(0, 61)),
  D = c(rep(0, 61)),
  E = c(rep(0, 61))
)

Where year is equal to the year in the simulation and event is equal to the probability that some event has happened.

For the purposes of this example, let's say that the formulas to populate each of A:D were as follows:

# Formula for B

mydata$B[[i-1]]*(1-0.04) + mydata$B[[i-1]]*(1-mydata$event[[i]])

# Formula for C
mydata$B[[i-1]]*(mydata$event[[i]])

# Formula for D
mydata$C[[i-1]]*(1-0.04) + mydata$D[[i-1]]*(1-mydata$event[[i]])

# Formula for E
mydata$A[[i-1]]*0.04 + mydata$B[[i-1]]*mydata$event[[i]] + 
  (mydata$C[[i-1]] + mydata$D[[i]])*mydata$event[[i]] + mydata$E[[i-1]]

The A variable will be 0 for each year, except for year 0.

As you can see in the formulas, I need to populate the dataset by row and then by column. I know how to do the loop for the column, where I am seeking guidance is to get the formulas to run across rows first.

Any advice this community could offer would be appreciated!


#2

First, you will get much better performance if you fill vectors than data frames. Below is an example with the information that you supplied. Because many of your initial values are zero, these calculations don't yield any results.

# initialize vectors
year = c(0, seq(1:60))
event = c(0, rep(0.05, 12), rep(0.10, 12), 
          rep(0.15, 12), rep(0.20, 12), rep(0.25, 12))
A = c(1000, rep(0, 60))
B = c(rep(0, 61))
C = c(rep(0, 61))
D = c(rep(0, 61))
E = c(rep(0, 61))

# loop across vectors
for (i in 2:length(year)){
  B[i] = B[i-1]*(1-0.04) + B[i-1]*(1-event[i])
  C[i] = B[i-1]*(event[i])
  D[i] = C[i-1]*(1-0.04) + D[i-1]*(1-event[i])
  E[i] = A[i-1]*0.04 + B[i-1]*event[i] + (C[i-1] + D[i])*event[i] + E[i-1]
}

# put everything into a tibble for subsequent analysis
res = tibble(year = year,
             event = event,
             A = A,
             B = B,
             C = C,
             D = D,
             E = E)

#3

Thanks for your help! Worked exactly as I was hoping.

My intent for filling into a data frame was an attempt to try and keep it simple. Clearly I was wrong.


#4

I'm gonna throw another solution into the ring! I'm not sure if it'll end up being more performant than @hinkelman's solution (and it might be less intuitive for someone used to looping, although I'd argue it's worth learning the mapping approach more generally for R), but I've been meaning to learn tibbletime for a while, so here it is :wink:

Basically, tibbletime::rollify uses purrr to let you turn a regular function into a rolling one. So you can define your functions as they would work on a windowed version of your data frame, wrap them in rollify, and then use them with other tidyverse tools:

library(tidyverse)
library(tibbletime)

# also, here's a slightly shorter way to initialise your data frame!
mydata = tibble(
  year = 0:60,
  event = c(0,
    seq(from = 0.05, to = 0.25, by = 0.05) %>% rep(each = 12)),
  A = c(1000, rep(0, 60)),
  B = 0, C = 0, D = 0, E = 0)

# specify the rollified functions for each window
calc_B = rollify(
  function(B, event) { B[2] * (1 - 0.04) + B[1] * (1 - event[2]) },
  window = 2)
calc_C = rollify(
  function(B, event) { B[1]* event[2] },
  window = 2)
calc_D = rollify(
  function(C, D, event) { C[1] * ( 1 - 0.04) + D[1] * (1 - event[2]) },
  window = 2)
calc_E = rollify(
  function(A, B, C, D, E, event) {
    A[1] * 0.04 + B[1] * event[2] + (C[1] + D[2]) * event[2] + E[1] },
  window = 2)

# and now you apply them to the data frame
mydata %>% mutate(
  B = calc_B(B, event),
  C = calc_C(B, event),
  D = calc_D(C, D, event),
  E = calc_E(A, B, C, D, E, event))