Context is app installs and cumulative revenue.
On a cohort basis (daily or weekly), cumulative revenue is logarithmic in shape. I'd like to develop a predictive model that predicts the cumulative revenue of a cohort many weeks out after a week or two since install.
Before diving in I'm trying to visualize in my mind what a training and prediction workflow could look like.
Which historic data do I use to fit a model? Lets say our app has 5 years of historic data. For each new weekly cohort, I'd like to predict out into the future e.g 3 months, 6 months, 12 months, etc. of what the cohorts cumulative revenue might look like. Do I use the 7 days spend behavior of each cohort to inform a custom prediction per cohort? Or do I Just train a model on all historic data and predict a uniform prediction on all new cohorts will have $x, $y, $z cumulative revenue after 3, 6 and 12 months?
Assuming variation in cohort behavior, presumably I want to be able to use the first e.g. 7 days of cumulative revenue for the cohort to be able to inform a future prediction, as opposed to ignoring that and using a 'main' model that is trained on all historic data.
If we want to predict out as far as 6 months or a year, then any model must surely have at least 6 months or a years worth of historic data to train with. i.e. I could not fit a new model just for a specific cohort with 7 days of revenue data and then attempt to predict what 6 months of revenue look like. So how can I combine data unique to the cohort with historic data to make a prediction?
Within the above context, what are some good cohort based approaches to cumulative revenue prediction? What's my training data? Do I use the first 7 days of spend behavior to inform my prediction?