Generating additional rows based on a condition

I have a data frame like DF below which will be imported directly from the database (as tibble).

library(tidyverse)
library(lubridate)


date_until <- dmy("31.05.2019")
date_val  <- dmy("30.06.2018")

DF <-  data.frame( date_bal   = as.Date(c("2018-04-30", "2018-05-31", "2018-06-30", "2018-05-31", "2018-06-30")),
                   department = c("A","A","A","B","B"),
                   amount     = c(10,20,30,40,50)
)

DF <- DF %>%
  as_tibble()
DF

It represents the amount of money spent by each department in a specific month. My task is to project how much money will be spent by each department in the following months until a specified date in the future (in this case date_until=31.05.2019)

I would like to use tidyverse in order to generate additional rows for each department where the first column date_bal would be a sequence of dates from the last one from "original" DF up until date_until which is predefined. Then I would like to add additional column called "DIFF" which would represent the difference between DATE_BAL and DATE_VAL, where DATE_VAL is also predefined. My final result would look like this:

I have managed to do this in the following way:
a) first filter data from DF for department A
b) Create another DF2 by populating it with date sequence from min(dat_bal) to date_until from a)
c) Merge data frames from a) and b) and then add calculated columns using mutate

Since I will have to repeat this procedure for many departments I wonder if it's possible to add rows (create date sequence) in existing DF (without creating a second DF and then merging).

Thanks in advance for your help and time.

1 Like

You might take a look at the padr package for adding missing rows to your timeseries data

library(tidyverse)
#> Warning: replacing previous import 'dplyr::vars' by 'rlang::vars' when
#> loading 'dbplyr'
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(padr)

date_until <- dmy("31.05.2019")
date_val  <- dmy("30.06.2018")

DF <-  data.frame( date_bal   = as.Date(c("2018-04-30", "2018-05-31", "2018-06-30", "2018-05-31", "2018-06-30")),
                   department = c("A","A","A","B","B"),
                   amount     = c(10,20,30,40,50)
)

DF <- DF %>%
  as_tibble()

padded_df <- DF %>%
  group_by(department) %>%
  pad() %>%
  fill_by_value(value = 0)
#> pad applied on the interval: day

head(padded_df)
#> # A tibble: 6 x 3
#> # Groups:   department [1]
#>   date_bal   department amount
#>   <date>     <fct>       <dbl>
#> 1 2018-04-30 A              10
#> 2 2018-05-01 A               0
#> 3 2018-05-02 A               0
#> 4 2018-05-03 A               0
#> 5 2018-05-04 A               0
#> 6 2018-05-05 A               0

Created on 2019-06-13 by the reprex package (v0.3.0)

From there, you can use mutate to create your new variable with the difference.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.