Generating additional rows based on a condition

I have a data frame like DF below which will be imported directly from the database (as tibble).

library(tidyverse)
library(lubridate)


date_until <- dmy("31.05.2019")
date_val  <- dmy("30.06.2018")

DF <-  data.frame( date_bal   = as.Date(c("2018-04-30", "2018-05-31", "2018-06-30", "2018-05-31", "2018-06-30")),
                   department = c("A","A","A","B","B"),
                   amount     = c(10,20,30,40,50)
)

DF <- DF %>%
  as_tibble()
DF

It represents the amount of money spent by each department in a specific month. My task is to project how much money will be spent by each department in the following months until a specified date in the future (in this case date_until=31.05.2019)

I would like to use tidyverse in order to generate additional rows for each department where the first column date_bal would be a sequence of dates from the last one from "original" DF up until date_until which is predefined. Then I would like to add additional column called "DIFF" which would represent the difference between DATE_BAL and DATE_VAL, where DATE_VAL is also predefined. My final result would look like this:

I have managed to do this in the following way:
a) first filter data from DF for department A
b) Create another DF2 by populating it with date sequence from min(dat_bal) to date_until from a)
c) Merge data frames from a) and b) and then add calculated columns using mutate

Since I will have to repeat this procedure for many departments I wonder if it's possible to add rows (create date sequence) in existing DF (without creating a second DF and then merging).

Thanks in advance for your help and time.

1 Like

You might take a look at the padr package for adding missing rows to your timeseries data

library(tidyverse)
#> Warning: replacing previous import 'dplyr::vars' by 'rlang::vars' when
#> loading 'dbplyr'
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(padr)

date_until <- dmy("31.05.2019")
date_val  <- dmy("30.06.2018")

DF <-  data.frame( date_bal   = as.Date(c("2018-04-30", "2018-05-31", "2018-06-30", "2018-05-31", "2018-06-30")),
                   department = c("A","A","A","B","B"),
                   amount     = c(10,20,30,40,50)
)

DF <- DF %>%
  as_tibble()

padded_df <- DF %>%
  group_by(department) %>%
  pad() %>%
  fill_by_value(value = 0)
#> pad applied on the interval: day

head(padded_df)
#> # A tibble: 6 x 3
#> # Groups:   department [1]
#>   date_bal   department amount
#>   <date>     <fct>       <dbl>
#> 1 2018-04-30 A              10
#> 2 2018-05-01 A               0
#> 3 2018-05-02 A               0
#> 4 2018-05-03 A               0
#> 5 2018-05-04 A               0
#> 6 2018-05-05 A               0

Created on 2019-06-13 by the reprex package (v0.3.0)

From there, you can use mutate to create your new variable with the difference.