Asked 3 days ago
Viewed 14 times
I'm leveraging on two R packages named clickstream and markovchain in order to:
- create a Markovian Chain (model) which will be utilized to perform attribution between different marketing channels which are included in a dataset composed by channel sequences (utilizing package MarkovChain)
- utilize the same model to make predictions about the probability of new sequences to convert. The predictions will be utilized as "validation" of the model to perform a good attribution ("if it predict well then it is a good attributor too").
Beside the R packages, The issue I'm facing is that the dataset is composed by many sequences which include so rare channels that maybe just show one or few times. Attached the frequency of the channels:
table(data$interaction_type) C-Level_3RDLIVE C-Level_3RDWEBINOD C-Level_3RDWP C-Level_AR C-Level_ARCHWEB C-Level_ASKOD 11 1 12 2 1 1 C-Level_CR C-Level_EBOOK C-Level_ID C-Level_MEDIA C-Level_ODSASWEBIN C-Level_ONASOFF 1 6 3 3 1 1 C-Level_OOTR C-Level_PEV C-Level_RMCHR C-Level_SASCON C-Level_SASEXEC C-Level_SASLIVE 1 1 1 2 9 29 C-Level_SASWEB C-Level_SASWEBIN C-Level_SASWP C-Level_SD C-Level_SEFR C-Level_SRSLT 4 2 11 1 2 2 C-Level_TEL C-Level_WBR C-Level_WPR C-Level_WS Director_ Director_3RDLIVE 1 1 7 1 15 33 Director_3RDWP Director_ARCHWEB Director_CHAT Director_COMR Director_CR Director_EBOOK 15 2 1 1 2 3 Director_EXECA Director_MEDIA Director_PEV Director_RMCHR Director_SASCON Director_SASEXEC 4 3 1 1 5 30 Director_SASLIVE Director_SASWEB Director_SASWP Director_SD Director_SEFR Director_SRSLT 106 3 12 1 9 10 Director_TEL Director_WPR Manager_ Manager_3RDLIVE Manager_3RDWEBIN Manager_3RDWP 4 7 7 28 2 42 Manager_AR Manager_ARCHWEB Manager_ASK Manager_ASKOD Manager_CS Manager_DBM 5 2 1 1 2 1
As you can see many of them are included jus tin one sequence only.
The problem arise when using a 10-fold cross validation some of these channels ended just in the test dataset but are not included in the train dataset. The predict() function of course it is not able to make predictions on the test dataset due to the missing coefficents.
- How would you manage this rare levels? Any peculiarity related to the markov chain process?
- Also, I have read about the chance to bin together the rare levels in a single class ("other") in order to have the same levels in both sets. However, I'm wondering if reducing the number of levels within the predictive task will generate a different model than the one derived in the attribution tasks (where the rare levels represent not issue), therefore not allowing to justify the attribution with a good prediction performance