Hello everyone, this is my first post. I've been using R for several years and have been using it to forecast the enrollment funnel for a school's programs. It's relatively accurate, however I still thought it would be good to ask if someone here has done something similar or has any suggestions for making it more accurate that I might not have thought of.
The basis of any marketing funnel is that not everyone moves from one stage of the funnel to the next. For a school we look at it in terms of applications. So our marketing funnel looks like this:
Started Applications >= Completed Applications >= Admits >= Deposits >= Enrollments
I approached this problem with a hierarchical time series model, with each data point of the time series being a snapshot of the funnel on the first of the month. Because it's a hierarchical time series, the monthly observations are from the bottom level of the time series where they are broken down by enrollments, deposits but not enrolleds, admits but not deposited, etc. and the sum of each row is the total number of applications at all stages of the funnel.
Here is how one year of data looks:
Enrollments | Deposits Not enrolled | Non-deposits Enrolled | Admits Not deposited | Non adm1 | Non adm2 | Non adm3 | Applied Not admitted | Non app1 | Non app2 | Non app3 | Non app4 | Non app5 | Non app6 | Non app7 | Started app Not applied |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 16 | 0 | 28 | 0 | 0 | 0 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 96 |
0 | 18 | 0 | 26 | 0 | 0 | 0 | 44 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 254 |
0 | 23 | 0 | 38 | 0 | 0 | 0 | 95 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 549 |
0 | 27 | 0 | 53 | 0 | 0 | 0 | 146 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 857 |
0 | 28 | 0 | 173 | 0 | 0 | 0 | 296 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 987 |
0 | 51 | 0 | 361 | 0 | 0 | 0 | 353 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1036 |
16 | 111 | 0 | 529 | 0 | 0 | 0 | 376 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 938 |
116 | 128 | 0 | 512 | 0 | 0 | 0 | 388 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 941 |
247 | 89 | 0 | 491 | 0 | 0 | 0 | 377 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 987 |
295 | 55 | 0 | 492 | 0 | 0 | 0 | 387 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1043 |
317 | 56 | 0 | 499 | 0 | 0 | 0 | 385 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1059 |
369 | 37 | 0 | 496 | 0 | 0 | 0 | 379 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1053 |
Here's the code I use to read in the data and make a forecast:
CASts <- ts(CAS_Time_Series, start = c(2014, 1), end = c(2019, 8), frequency = 12)
nodes <- list(2, rep(2, 2), rep(2, 4), rep(2, 8))
y <- hts(CASts, nodes = nodes)
allf <- forecast(y, h = 4, FUN = function(x) ets(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file="MAY_forecastCAS_ets.csv", sep=",", row.names=FALSE)
I then have a function that repeats the process for the following prediction methods
- hw
- stlf
- auto.arima
- nnetar
The final forecast is an average of the five methods.
My question is this: is there a better, more accurate, or more elegant way to do this? I'm looking to make sure that I'm making the most accurate time series forecast that I can produce. I appreciate all feedback!