Markov Chain predictive model combined with rare levels in sequential variable

davecrt · January 27, 2020, 10:43am

Markov Chain predictive model combined with rare levels in sequential variable

Asked 3 days ago

Viewed 14 times

0

I'm leveraging on two R packages named clickstream and markovchain in order to:

create a Markovian Chain (model) which will be utilized to perform attribution between different marketing channels which are included in a dataset composed by channel sequences (utilizing package MarkovChain)
utilize the same model to make predictions about the probability of new sequences to convert. The predictions will be utilized as "validation" of the model to perform a good attribution ("if it predict well then it is a good attributor too").

Beside the R packages, The issue I'm facing is that the dataset is composed by many sequences which include so rare channels that maybe just show one or few times. Attached the frequency of the channels:

table(data$interaction_type)

   C-Level_3RDLIVE     C-Level_3RDWEBINOD          C-Level_3RDWP             C-Level_AR        C-Level_ARCHWEB          C-Level_ASKOD 
                11                      1                     12                      2                      1                      1 
        C-Level_CR          C-Level_EBOOK             C-Level_ID          C-Level_MEDIA     C-Level_ODSASWEBIN        C-Level_ONASOFF 
                 1                      6                      3                      3                      1                      1 
      C-Level_OOTR            C-Level_PEV          C-Level_RMCHR         C-Level_SASCON        C-Level_SASEXEC        C-Level_SASLIVE 
                 1                      1                      1                      2                      9                     29 
    C-Level_SASWEB       C-Level_SASWEBIN          C-Level_SASWP             C-Level_SD           C-Level_SEFR          C-Level_SRSLT 
                 4                      2                     11                      1                      2                      2 
       C-Level_TEL            C-Level_WBR            C-Level_WPR             C-Level_WS              Director_       Director_3RDLIVE 
                 1                      1                      7                      1                     15                     33 
    Director_3RDWP       Director_ARCHWEB          Director_CHAT          Director_COMR            Director_CR         Director_EBOOK 
                15                      2                      1                      1                      2                      3 
    Director_EXECA         Director_MEDIA           Director_PEV         Director_RMCHR        Director_SASCON       Director_SASEXEC 
                 4                      3                      1                      1                      5                     30 
  Director_SASLIVE        Director_SASWEB         Director_SASWP            Director_SD          Director_SEFR         Director_SRSLT 
               106                      3                     12                      1                      9                     10 
      Director_TEL           Director_WPR               Manager_        Manager_3RDLIVE       Manager_3RDWEBIN          Manager_3RDWP 
                 4                      7                      7                     28                      2                     42 
        Manager_AR        Manager_ARCHWEB            Manager_ASK          Manager_ASKOD             Manager_CS            Manager_DBM 
                 5                      2                      1                      1                      2                      1

As you can see many of them are included jus tin one sequence only.

The problem arise when using a 10-fold cross validation some of these channels ended just in the test dataset but are not included in the train dataset. The predict() function of course it is not able to make predictions on the test dataset due to the missing coefficents.

How would you manage this rare levels? Any peculiarity related to the markov chain process?
Also, I have read about the chance to bin together the rare levels in a single class ("other") in order to have the same levels in both sets. However, I'm wondering if reducing the number of levels within the predictive task will generate a different model than the one derived in the attribution tasks (where the rare levels represent not issue), therefore not allowing to justify the attribution with a good prediction performance

system · February 17, 2020, 10:43am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.