Retention Rate and age by month calculation

Dear Rstudio Community.

In creating a retention rate for donors to my company, I have run into a little bit of trouble.
I have used following guide ( Cohort Analysis with R ) and succeeded in all steps until the step called ‘Cohort Age’ , when I run the MAU mixed panel, I have an issue as certain cells are left as blanks.
I have narrowed my problem down to an issue with creating the age of donors by month. Since payments fall on official bank days, certain months are 25 days and certain months are more than 30 days. In the following example payments that fall on 12th of February and the 9th of March are categorized as though they are made in the same month since the distance between the payments is 25 days. How do I ensure that these payments are categorized as separate months so that I can continue on to creating my retention rate?
I’ve tried to make a bit of a mock code that I hope makes sense. I also attach a screenshot of my results when I run the data.
Many thanks to any suggestions that can help with this problem.

  • Naja
#Code:
library(lubridate)
mockdata <-tibble::tribble(
             ~Customer_ID, ~Payment_date, ~Year,   ~Join.date, ~Frequency, ~Cohort,
                    18445,  "09-11-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "11-06-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "12-12-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "12-09-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "11-07-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "10-04-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "13-08-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "10-10-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "09-03-2018",  2018, "12-02-2018",         11,       2,
                    18445,  "12-02-2018",  2018, "12-02-2018",         11,       2
             )
head(mockdata)
##############################################################
#Age
# Calculating the difference in days between the Payment.date column by join date column
mockdata$Age_by_Day <- as.numeric(difftime(mockdata$Payment_date,mockdata$Join.date, units = c("days")))
#######################################################################################
#ERROR! 
#######################################################################################
# There is no option for month, but getting the month from the days is simple division
# Dividing the days by 30 to get the number of months
mockdata$Age_by_Month <- floor(mockdata$Age_by_Day/30)

Hi,

This is my idea how to do it:

library(lubridate)
library(dplyr)

mockdata <-tibble::tribble(
  ~Customer_ID, ~Payment_date, ~Year,   ~Join.date, ~Frequency, ~Cohort,
  18445,  "09-11-2018",  2018, "12-02-2018",         11,       2,
  18445,  "11-06-2018",  2018, "12-02-2018",         11,       2,
  18445,  "12-12-2018",  2018, "12-02-2018",         11,       2,
  18445,  "12-09-2018",  2018, "12-02-2018",         11,       2,
  18445,  "11-07-2018",  2018, "12-02-2018",         11,       2,
  18445,  "10-04-2018",  2018, "12-02-2018",         11,       2,
  18445,  "13-08-2018",  2018, "12-02-2018",         11,       2,
  18445,  "10-10-2018",  2018, "12-02-2018",         11,       2,
  18445,  "09-03-2018",  2018, "12-02-2018",         11,       2,
  18445,  "12-02-2018",  2018, "12-02-2018",         11,       2
)

#Transform dates to date class
mockdata = mockdata %>% mutate(Payment_date= as.Date(Payment_date, format = "%d-%m-%Y"), 
                               Join.date = as.Date(Join.date, format = "%d-%m-%Y"))

#Calculate the difference in months
mockdata = mockdata %>% mutate(Age_by_Month = (year(Payment_date) - year(Join.date))*12 +
                                 (month(Payment_date) - month(Join.date)))

# A tibble: 10 x 7
   Customer_ID Payment_date  Year Join.date  Frequency Cohort Age_by_Month
         <dbl> <date>       <dbl> <date>         <dbl>  <dbl>        <dbl>
 1       18445 2018-11-09    2018 2018-02-12        11      2            9
 2       18445 2018-06-11    2018 2018-02-12        11      2            4
 3       18445 2018-12-12    2018 2018-02-12        11      2           10
 4       18445 2018-09-12    2018 2018-02-12        11      2            7
 5       18445 2018-07-11    2018 2018-02-12        11      2            5
 6       18445 2018-04-10    2018 2018-02-12        11      2            2
 7       18445 2018-08-13    2018 2018-02-12        11      2            6
 8       18445 2018-10-10    2018 2018-02-12        11      2            8
 9       18445 2018-03-09    2018 2018-02-12        11      2            1
10       18445 2018-02-12    2018 2018-02-12        11      2            0

I extracted the year and month for both payment date and join date. Then I use the difference in months and years (= 12 months) to calculate the total difference in months.

Hope this helps,
PJ

1 Like

Hey PJ,
This is a massive help, thank you. I can finally run the retention rate now, which is a huge relief.
Enjoy your day,
Naja

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.