Creating discrete choice dataset for mlogit / rchoice / mnlogit

(I have revised my question with “reprex” … I hope it make more sense now …)

I want to analyze discrete choice data from a panel dataset (i.e., individual ID + individual characteristics + choices made when presented with a choice set).

I have to create this discrete choice panel dataset from two “inputs” :
(1) a statistical design matrix. identifying choice sets shown to the respondent (described as alternatives with attributes)
(2) respondent data. identifying choices made + individual characteristics (gender, income, etc)

My question is how to combine these two inputs into a dataset that I can analyze with R packages such as e.g., “clogit” “mlogit” “rchoice” “mnlogit” etc. Specifically, I want to create a “long” format dataset (see e.g.,

In my reprex below I provide:
(1) a sample statistical design that is similar to mine
(2) a sample respondent dataset that is similar to mine

Can you help with the R code to create a (“long”) discrete choice dataset (i.e., 1 row per respondent)?

Based on my reprex sample datasets below, it would seem that my desired dataset should contain at least the following variables:

  • id - identifying respondent
  • block - identifying survey block
  • qes - identifying choice set/question the respondents faced
  • alt - the alternative included in the choice set/question
  • choice - the choice the respondent made (either alt 1 or alt 2)
  • asc - alternative specific constant
  • att.loc - level of attribute 1 used in alternative
  • att.size - level of attribute 2 used in alternative
  • gender
  • income


# First: statistical design matrix.  1 row per alternative.  Each question/choice set has 2 (unlabeled) alternatives.  
# I show only first 3 questions/choice sets, i.e.,  6 obs. <- data.frame(block = c(1,1,1,1,1,1), # 4 blocks of respodnents. Each recieved 6 questions/choice set 
                          qes = c(1,1,2,2,3,3), # identifies which of 24 different questions/choice sets from statistical design.  
                          alt = c(1,2,1,2,1,2), # each respondent faced 2 alternatives in each question/choice set
                          asc = c(0,1,0,1,0,1), # alt specific constant
                          att.loc = c(0,1,1,0,1,1),  # attribute 1: categorical variable
                          att.size = c(0,0,1,1,2,0)) # attribute 2: categorical variable
# Second:respondent data.  1 row per respondent. I show only first 5 respondents and only 2 choice sets (q1, q2) <- data.frame(id = c(1,2,3,4,5),    # respondent ID
                        block = c(1,2,1,1,1),   # correponds to "block" in dataframe    
                        q1    = c(1,2,2,1,1),   # respondents choice to q1.  1=chosen, 2=not chosen
                        q2    = c(1,2,2,1,1),   # respondents choice to q2.  1=chosen, 2=not chosen
                        gender= c(1,2,2,2,3), 
                        income= c(1,1,5,3,5))

Can you provide a reproducible example, it will make it easier for us to help you resolve the problem.

hey, sorry rookie mistake. See my revised original post above. I have clarified and created a "reprex". Hope it makes more sense, thx in advance.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.