rowwise iteration and mutation_at within a for loop

I'm attempting to do something that is clear in my head, but my fingers are not complying.
I have a dataframe with customer contract data over time. The structure looks like this:

   Account Type  MonthlyRate ContractStart ContractEnd
   <chr>   <chr>       <dbl> <date>        <date>     
 1 A       W              50 2018-01-01    2018-12-31 
 2 A       X              75 2018-03-15    2019-03-14 
 3 B       W              60 2018-02-28    2018-09-30 
 4 B       X              90 2018-05-12    2019-08-11 
 5 B       Y              45 2018-02-28    2018-09-30 
 6 C       Y              50 2018-07-31    2019-04-30 
 7 D       W              65 2019-01-01    2019-03-31 
 8 D       Y              50 2018-09-01    2019-05-31 
 9 D       Z             110 2018-08-22    2019-12-31 
10 E       Z             100 2018-10-01    2019-09-30

I want to create a report of revenues by calendar month, with the reporting period being the last day of each month. I would like to group by the Type field.
I've written a function that checks to see if the reporting date is within the contract period and, if so, the revenue equals MonthlyRate. If the contract ends mid-month, the revenue is prorated. My function looks like this:

# Assumes reporting date is at end of month and that dates are in POSIX format
monthly_revenue <- function(reporting_date, monthly_rate, start, end) {
  contract_int <- interval(start, end) # Contract interval
  # Calculate interval ending the last day of the month of contract end
  end_of_month <- end
  day(end_of_month) <- days_in_month(end)
  end_of_month_int <- interval(start, end_of_month)
  # Check if reporting date is within contract interval
  if(reporting_date %within% contract_int) {
    val <- 1 # bill for entire month
    # If not within interval, check if contract is in its last month
  } else if (reporting_date %within% end_of_month_int) {
    val <- day(end) / days_in_month(end) # prorate monthly charges
  } else { # Not within contract
    val <- 0 # zero revenue
  }
  return (val * monthly_rate)
}

Now for the fun part. I want to create a sequence of month-end dates and use them in a for loop to create new columns for each date. Each row of the column contains the monthly revenue for the respective service. I can then gather the columns and create my report/plot using group_by and summarize.
My code for this section is as follows:

reporting_date_seq <- seq(as.Date("2018-02-01"), length = 24, by = "1 month") -1
for(i in reporting_date_seq){
  revenue_for_month <- revenue_for_month %>%
    rowwise() %>%
#    mutate(i = monthly_revenue(as.Date(i, origin = "1970-01-01"), MonthlyRate, ContractStart, ContractEnd))
    mutate_at(c(as.Date(i, origin = "1970-01-01"), "MonthlyRate", "ContractStart", "ContractEnd"),
              monthly_revenue)
}

This results in an error:

Error in mutate_impl(.data, dots) : 
  Evaluation error: argument "start" is missing, with no default.

I'm assuming the problem is due to my trying to use the for-loop index as a function argument, but I'm not sure. There's probably a simpler, more elegant way to do what I'm trying, but I just don't know what it is.
Any advice (other than "walk west until your hat floats") is welcomed.

Hi Ephraim, welcome!
It would be easier to help you if we could visualize your desired final output, could you share a sample of your data and desired output on a copy/paste friendly format?
If you don't know how, here is a nice blog post by Mara that explains how to do it

1 Like

Sure, let me give this a try. Here's a tibble that illustrates what I want the output to look like:

tibble::tribble(
  ~Account, ~Type, ~MonthlyRate, ~ContractStart, ~ContractEnd, ~6/30/2018, ~7/31/2018, ~8/31/2018, ~9/30/2018, ~10/31/2018,
       "A",   "W",        "$50",     "1/1/2018", "12/31/2018",   "$50.00",   "$50.00",   "$50.00",   "$50.00",    "$50.00",
       "A",   "X",        "$75",     "8/1/2018",  "3/14/2019",    "$0.00",    "$0.00",   "$75.00",   "$75.00",    "$75.00",
       "B",   "W",        "$60",    "2/28/2018",  "9/28/2018",   "$60.00",   "$60.00",   "$60.00",   "$56.00",     "$0.00",
       "B",   "X",        "$90",    "5/12/2018",  "8/11/2018",   "$90.00",   "$90.00",   "$31.94",    "$0.00",     "$0.00",
       "B",   "Y",        "$45",    "2/28/2018",  "9/30/2018",   "$45.00",   "$45.00",   "$45.00",   "$45.00",     "$0.00",
       "C",   "Y",        "$50",    "7/31/2018",  "4/30/2019",    "$0.00",    "$1.61",   "$50.00",   "$50.00",    "$50.00",
       "D",   "W",        "$65",     "1/1/2019",  "3/31/2019",    "$0.00",    "$0.00",    "$0.00",    "$0.00",     "$0.00",
       "D",   "Y",        "$50",     "9/1/2018",  "5/31/2019",    "$0.00",    "$0.00",    "$0.00",   "$50.00",    "$50.00",
       "D",   "Z",       "$110",    "8/22/2018", "10/15/2018",    "$0.00",    "$0.00",   "$31.94",  "$110.00",    "$53.23",
       "E",   "Z",       "$100",    "10/1/2018",  "9/30/2019",    "$0.00",    "$0.00",    "$0.00",    "$0.00",   "$100.00"
  )

I would then execute the following:

excluded_cols <- c("MonthlyRate", "ContractStart", "ContractEnd")
long_bookings <- bookings %>%
  select(-excluded_cols) %>% 
  gather(key = "reporting_date", value = "revenues", -Account, -Type)

This gives me the following:

> long_bookings
# A tibble: 50 x 4
   Account Type  reporting_date revenues
   <chr>   <chr> <chr>             <dbl>
 1 A       W     6/30/2018            50
 2 A       X     6/30/2018             0
 3 B       W     6/30/2018            60
 4 B       X     6/30/2018            90
 5 B       Y     6/30/2018            45
 6 C       Y     6/30/2018             0
 7 D       W     6/30/2018             0
 8 D       Y     6/30/2018             0
 9 D       Z     6/30/2018             0
10 E       Z     6/30/2018             0
# ... with 40 more rows

Finally, I'd group and summarize:

long_bookings %>%
  group_by(reporting_date, Type) %>%
  summarize(MonthRev = sum(revenues))

Which gives me:

# A tibble: 20 x 3
# Groups:   reporting_date [?]
   reporting_date Type  MonthRev
   <chr>          <chr>    <dbl>
 1 10/31/2018     W         50  
 2 10/31/2018     X         75  
 3 10/31/2018     Y        100  
 4 10/31/2018     Z        153. 
 5 6/30/2018      W        110  
 6 6/30/2018      X         90  
 7 6/30/2018      Y         45  
 8 6/30/2018      Z          0  
 9 7/31/2018      W        110  
10 7/31/2018      X         90  
11 7/31/2018      Y         46.6
12 7/31/2018      Z          0  
13 8/31/2018      W        110  
14 8/31/2018      X        107. 
15 8/31/2018      Y         95  
16 8/31/2018      Z         31.9
17 9/30/2018      W        106  
18 9/30/2018      X         75  
19 9/30/2018      Y        145  
20 9/30/2018      Z        110 
1 Like

If I understand you correctly this would be one way to do it

df <- data.frame(stringsAsFactors=FALSE,
                 Account = c("A", "A", "B", "B", "B", "C", "D", "D", "D", "E"),
                 Type = c("W", "X", "W", "X", "Y", "Y", "W", "Y", "Z", "Z"),
                 MonthlyRate = c(50, 75, 60, 90, 45, 50, 65, 50, 110, 100),
                 ContractStart = as.Date(c("2018-01-01", "2018-08-01", "2018-02-28", "2018-05-12",
                                           "2018-02-28", "2018-07-31", "2019-01-01", "2018-09-01",
                                           "2018-08-22", "2018-10-01")),
                 ContractEnd = as.Date(c("2018-12-31", "2019-03-14", "2018-09-28", "2018-08-11",
                                         "2018-09-30", "2019-04-30", "2019-03-31", "2019-05-31",
                                         "2018-10-15", "2019-09-30"))
)

head(df)
#>   Account Type MonthlyRate ContractStart ContractEnd
#> 1       A    W          50    2018-01-01  2018-12-31
#> 2       A    X          75    2018-08-01  2019-03-14
#> 3       B    W          60    2018-02-28  2018-09-28
#> 4       B    X          90    2018-05-12  2018-08-11
#> 5       B    Y          45    2018-02-28  2018-09-30
#> 6       C    Y          50    2018-07-31  2019-04-30

library(tidyverse)
library(lubridate)

as.Date(seq(as.Date("2018-02-01"), length = 24, by = "1 month") - 1) %>% 
    map_dfr(~mutate(df,
                    revenues = case_when(. %within% interval(ContractStart, ContractEnd) ~ MonthlyRate,
                                         . %within% interval(ContractStart, ceiling_date(ContractEnd, unit = "months") - 1) ~ (day(ContractEnd) / days_in_month(ContractEnd)) * MonthlyRate,
                                         TRUE ~ 0),
                    reporting_date = .)) %>% 
    group_by(reporting_date, Type) %>% 
    summarise(MonthRev = sum(revenues))
#> # A tibble: 96 x 3
#> # Groups:   reporting_date [24]
#>    reporting_date Type  MonthRev
#>    <date>         <chr>    <dbl>
#>  1 2018-01-31     W           50
#>  2 2018-01-31     X            0
#>  3 2018-01-31     Y            0
#>  4 2018-01-31     Z            0
#>  5 2018-02-28     W          110
#>  6 2018-02-28     X            0
#>  7 2018-02-28     Y           45
#>  8 2018-02-28     Z            0
#>  9 2018-03-31     W          110
#> 10 2018-03-31     X            0
#> # … with 86 more rows

Created on 2019-04-02 by the reprex package (v0.2.1.9000)

5 Likes

This one's excellent, and I love this one.

However, I'd like to point out that as.Date is not required here, as seq.Date returns objects of class Date. See below:

date_seq_1 <- as.Date(seq(from = as.Date("2018-02-01"), length.out = 24, by = "1 month") - 1)
date_seq_2 <- (seq(from = as.Date("2018-02-01"), length.out = 24, by = "months") - 1)

all.equal(target = date_seq_1,
          current = date_seq_2)
#> [1] TRUE

Created on 2019-04-02 by the reprex package (v0.2.1)

2 Likes

@andresrcs, wow! You not only gave me a working solution (I wasn't familiar with the map_dfr function), but you replaced my clunky function with a simple case statement. As Shakespeare said, "brevity is the soul of wit."
Thanks so much for your witty solution!

Yeah, for some reason when rendering the reprex the function complained about non date arguments and it was too late for me so I just put it inside as.Date and go to sleep.

1 Like

@patchman you're welcome! I'm glad to been able to help you

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.