Complete a column using the last complete value and values from another column

I did the forecast of the monthly variation of an indice and, base on the las observed indice and the forecasted variations, I want to get the corresponding forecast of the indice.

I have four variables:

  • date: the month
  • f: indicates if the observation is an observed value ("observado") or a forecast result. When it is a forecasted result this variable indicate the model used to get the forecast ("auto.arima", "ets", "bats")
  • var: the monthly variation
  • indice: target variable that should be fill using this formula indice_t = indice_{t-1} * (1 + var_t)

Example data

df <- structure(list(date = structure(c(18109, 18140, 18170, 18201, 
18231, 18262, 18293, 18140, 18170, 18201, 18231, 18262, 18293, 
18140, 18170, 18201, 18231, 18262, 18293), class = "Date"), f = c("observado", 
"auto.arima", "auto.arima", "auto.arima", "auto.arima", "auto.arima", 
"auto.arima", "ets", "ets", "ets", "ets", "ets", "ets", "bats", 
"bats", "bats", "bats", "bats", "bats"), var = c(0.00164468135353065, 
0.000789326024367794, 0.000378818408445376, 0.000181804960367329, 
8.72530026980781e-05, 4.18750207059747e-05, 2.00969285285633e-05, 
-0.00319710269210977, -0.00322906819558889, -0.00326022342842121, 
-0.00329058892957686, -0.00332018471739832, -0.00334903030279746, 
-0.000505340580329043, -0.00101461802017405, -0.00101461802017405, 
-0.00101461802017405, -0.00101461802017405, -0.00101461802017405
), indice = c(118.759, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -19L))

This is my firs question here, thanks in advance for your answer!

Edit

I edited the formula beacuse I was not exactly what I wanted to do.

At last I found a way to get the job done. But, even I keep the code in the tidyverse ground, it is not that clear and elegant as it could be.

df %>%
  mutate(
    indice = ifelse(
      is.na(indice),
      last(indice[f == "observado"]),
      indice)
    ) %>%
  group_by(f) %>%
  mutate(
    var_cum = (1+ var),
    var_cum = cumprod(var_cum),
    indice = ifelse(
      !f == "observado",
      indice * var_cum,
      indice
      )
    )

       date          f           var   indice   var_cum
1  2019-08-01  observado  1.644681e-03 118.7590 1.0016447
2  2019-09-01 auto.arima  7.893260e-04 118.8527 1.0007893
3  2019-10-01 auto.arima  3.788184e-04 118.8978 1.0011684
4  2019-11-01 auto.arima  1.818050e-04 118.9194 1.0013505
5  2019-12-01 auto.arima  8.725300e-05 118.9298 1.0014378
6  2020-01-01 auto.arima  4.187502e-05 118.9347 1.0014798
7  2020-02-01 auto.arima  2.009693e-05 118.9371 1.0014999
8  2019-09-01        ets -3.197103e-03 118.3793 0.9968029
9  2019-10-01        ets -3.229068e-03 117.9971 0.9935842
10 2019-11-01        ets -3.260223e-03 117.6124 0.9903448
11 2019-12-01        ets -3.290589e-03 117.2253 0.9870860
12 2020-01-01        ets -3.320185e-03 116.8361 0.9838087
13 2020-02-01        ets -3.349030e-03 116.4449 0.9805139
14 2019-09-01       bats -5.053406e-04 118.6990 0.9994947
15 2019-10-01       bats -1.014618e-03 118.5786 0.9984806
16 2019-11-01       bats -1.014618e-03 118.4582 0.9974675
17 2019-12-01       bats -1.014618e-03 118.3381 0.9964554
18 2020-01-01       bats -1.014618e-03 118.2180 0.9954444
19 2020-02-01       bats -1.014618e-03 118.0980 0.9944344
1 Like

Hi, and welcome!

Thanks for the reproducible example, called a reprex. These make things so much more efficient for everyone.

Help me clarify my understanding. Taking the first few rows

> df
# A tibble: 19 x 4
   date       f                 var indice
   <date>     <chr>           <dbl>  <dbl>
 1 2019-08-01 observado   0.00164     119.
 2 2019-09-01 auto.arima  0.000789     NA 
 3 2019-10-01 auto.arima  0.000379     NA 

Should row 2 be

2 2019-09-01 auto.arima  0.000789     0.09389 

and row 3

 3 2019-10-01 auto.arima  0.000379     3.558e-05 

That is indice at t_i should equal indice at t_{i -1} * var at t_i?

1 Like

Hi technocrat, tanks for your help!!

Actually, there was a typo in my formula and I just edited it. I handle myself to do the job, but if you have a recomendation to do it with a better approach, it will be nice.

2 Likes

Great. Please mark the solution for the benefit of those to follow.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.