How to remove NaNs from a simulated data series generated using for loop

mohitarora · January 3, 2022, 11:47am

I am trying to simulate data to generate data series for 2 variables to do some further analysis. I am writing a for loop to generate the data series according to the data generating process I have in mind. There are 1006 observations in each simulation (1000 observations for each variable plus 3 lags for each variable) and the matrix A1 can take 100 values. The code does generate data series for the 2 variables but after a few observations (around the 15th or so), I start getting NaNs instead of the values. For both the variables, the values keep getting bigger and then eventually hit NaN. Can someone help me with this? I want full numerical values for further analysis rather than NaNs. Thanks.

N <- 100
A1_sequence <- seq(from=0.5,to=1.5,length.out = N)
A1 <- array(data = A1_sequence, dim=c(2,2,100))
A2 <- matrix(c(0.2,1.1,-0.6,0.2), nrow=2, ncol=2)
A3 <- matrix(c(-0.8,1.2,-0.4,0.4), nrow=2, ncol=2)
A4 <- matrix(c(-0.01, 0.02,-0.03,0.05), nrow=2, ncol=2)
p <- 3 # Number of lags
N1 <- 1000+2*p # Number of observations in each simulation
k <- 2 #Number of endogenous variables
x <- matrix(0, k, N1)
myeps1 <- 0.25*rnorm(k)
for( i in (p+1):N1){
  for (j in 1:N){
  x[,i] <- A1[,,j]%*%x[,i-1] - A2%*%x[,i-2] - A3%*%x[,i-3] - A4%*%((x[,i-1])^3) + myeps1  
}
}

FJCC · January 3, 2022, 4:54pm

There is no flaw in your code causing the NaNs, it is the logic of your calculation. Each term of x depends on the cube of the previous term, plus some adjustments using terms farther back in the sequence. Once the values of x have an absolute value much above 1, they start to grow very rapidly and exceed the largest allowed value in R. You can only fix this by changing the model of your simulation.
Also, x is only storing the values when j = N, The value of j increments from 1 to N but each successive calculation overwrites the previous one.

mohitarora · January 3, 2022, 5:03pm

@FJCC , thanks for your reply. This is very helpful. As regards your second comment, how can I fix the overwriting of previous values? I need to run the loop 1000 times for i and for 100 values that the 2x2 matrix A1 takes. The array A1 has been defined above in my code.

FJCC · January 3, 2022, 5:19pm

You can make x an array and store the N different results. I do this in the code below and show the results for i = 1:20 for j = 1 and j = 100. Since the values of A1 are smaller for j = 1 than j = 100, the calculated values increase a little more slowly for j = 1 but the i = 20 value is about 1E+190, so the next step will be NaN even for j = 1

N <- 100
A1_sequence <- seq(from=0.5,to=1.5,length.out = N)
A1 <- array(data = A1_sequence, dim=c(2,2,100))
A2 <- matrix(c(0.2,1.1,-0.6,0.2), nrow=2, ncol=2)
A3 <- matrix(c(-0.8,1.2,-0.4,0.4), nrow=2, ncol=2)
A4 <- matrix(c(-0.01, 0.02,-0.03,0.05), nrow=2, ncol=2)
p <- 3 # Number of lags
N1 <- 1000+2*p # Number of observations in each simulation
k <- 2 #Number of endogenous variables
x <- array(data = 0, c(k, N1, N))
myeps1 <- 0.25*rnorm(k)
for( i in (p+1):N1){
  for (j in 1:N){
    x[,i,j] <- A1[,,j]%*%x[,i-1,j] - A2%*%x[,i-2,j] - A3%*%x[,i-3,j] - A4%*%((x[,i-1,j])^3) + myeps1  
  }
}
x[,1:20,1]
#>      [,1] [,2] [,3]        [,4]       [,5]       [,6]       [,7]        [,8]
#> [1,]    0    0    0  0.03903182 -0.1454364 -0.5779590 -1.0771888 -1.52723046
#> [2,]    0    0    0 -0.38873817 -0.5720410 -0.7220469 -0.6607307 -0.06567585
#>           [,9]     [,10]    [,11]    [,12]      [,13]      [,14]     [,15]
#> [1,] -1.726565 -1.080278 1.190540 4.031453  6.4058654   8.372963 -26.71532
#> [2,]  1.168073  2.623233 3.098518 2.608855 -0.8293063 -10.414098  30.33007
#>           [,16]     [,17]         [,18]         [,19]          [,20]
#> [1,]   645.6901 -29721583  3.189299e+21 -3.962298e+63  7.600320e+189
#> [2,] -1026.1268  48638138 -5.227981e+21  6.495696e+63 -1.245984e+190
x[,1:20, 100]
#>      [,1] [,2] [,3]        [,4]       [,5]      [,6]      [,7]       [,8]
#> [1,]    0    0    0  0.03903182 -0.4845456 -2.295505 -8.015300 -32.351066
#> [2,]    0    0    0 -0.38873817 -0.9111502 -2.397584 -5.627079   2.467767
#>           [,9]     [,10]         [,11]         [,12]          [,13] [,14] [,15]
#> [1,] -386.5318   7492132 -5.152917e+19  1.671953e+58 -5.710498e+173   NaN   NaN
#> [2,]  645.5113 -12293287  8.448013e+19 -2.740981e+58  9.361703e+173   NaN   NaN
#>      [,16] [,17] [,18] [,19] [,20]
#> [1,]   NaN   NaN   NaN   NaN   NaN
#> [2,]   NaN   NaN   NaN   NaN   NaN

^{Created on 2022-01-03 by the reprex package (v2.0.1)}

mohitarora · January 6, 2022, 2:04pm

@FJCC , thanks again for your help. When I reduce the range of values that A1 can take, the above loop that you have suggested works. For example when I use the code below, it does not give NaNs.

A1_sequence <- seq(from=0.5,to=0.55,length.out = N)
A1 <- array(data = A1_sequence, dim=c(2,2,100))
A4 <- matrix(c(-0.01, 0.02,-0.03,0.05), nrow=2, ncol=2)
p <- 3 # Number of lags
N1 <- 1000+2*p # Number of observations in each simulation
k <- 2 #Number of endogenous variables
x <- array(data = 0, c(k, N1, N))
myeps1 <- 0.25*rnorm(k)


for( i in (p+1):N1){
  for (j in 1:N){
  x[,i,j] <- A1[,,j]%*%x[,i-1,j] - A2%*%x[,i-2,j] - A3%*%x[,i-3,j] - A4%*%((x[,i-1,j])^3) + myeps1  
}
}

However, when I write the for loop for i values inside the for loop for j values, as in:

for (j in 1:N){
for( i in (p+1):N1){ 
  x[,i,j] <- A1[,,j]%*%x[,i-1,j] - A2%*%x[,i-2,j] - A3%*%x[,i-3,j] - A4%*%((x[,i-1,j])^3) + myeps1  
}
}

In this case, I get NaNs again in the x series generated. I seem to think that having the for loop for 'i' values being inside the for loop for 'j' values is better since there are N different values that j can take and for each j value, there are N1 observations. Do you have any thoughts on this? Thanks

FJCC · January 6, 2022, 6:15pm

I do not see any difference in the result between the two arrangements of the loops.

N <- 100
A1_sequence <- seq(from=0.5,to=0.55,length.out = N)
A1 <- array(data = A1_sequence, dim=c(2,2,100))
A2 <- matrix(c(0.2,1.1,-0.6,0.2), nrow=2, ncol=2)
A3 <- matrix(c(-0.8,1.2,-0.4,0.4), nrow=2, ncol=2)
A4 <- matrix(c(-0.01, 0.02,-0.03,0.05), nrow=2, ncol=2)
p <- 3 # Number of lags
N1 <- 1000+2*p # Number of observations in each simulation
k <- 2 #Number of endogenous variables
x <- array(data = 0, c(k, N1, N))
x2 <- array(data = 0, c(k, N1, N))
myeps1 <- 0.25*rnorm(k)


for( i in (p+1):N1){
  for (j in 1:N){
    x[,i,j] <- A1[,,j]%*%x[,i-1,j] - A2%*%x[,i-2,j] - A3%*%x[,i-3,j] - A4%*%((x[,i-1,j])^3) + myeps1  
  }
}

for (j in 1:N){
  for( i in (p+1):N1){ 
    x2[,i,j] <- A1[,,j]%*%x2[,i-1,j] - A2%*%x2[,i-2,j] - A3%*%x2[,i-3,j] - A4%*%((x2[,i-1,j])^3) + myeps1  
  }
}

identical(x,x2)
#> [1] TRUE

^{Created on 2022-01-06 by the reprex package (v2.0.1)}

mohitarora · January 7, 2022, 7:04pm

@FJCC , thanks! Your replies have been very helpful.

Best,
Mohit

system · January 28, 2022, 7:04pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.