(I have revised my question with “reprex” … I hope it make more sense now …)
I want to analyze discrete choice data from a panel dataset (i.e., individual ID + individual characteristics + choices made when presented with a choice set).
I have to create this discrete choice panel dataset from two “inputs” :
(1) a statistical design matrix. identifying choice sets shown to the respondent (described as alternatives with attributes)
(2) respondent data. identifying choices made + individual characteristics (gender, income, etc)
My question is how to combine these two inputs into a dataset that I can analyze with R packages such as e.g., “clogit” “mlogit” “rchoice” “mnlogit” etc. Specifically, I want to create a “long” format dataset (see e.g., https://cran.r-project.org/web/packages/mlogit/vignettes/c2.formula.data.html)
In my reprex below I provide:
(1) a sample statistical design that is similar to mine
(2) a sample respondent dataset that is similar to mine
Can you help with the R code to create a (“long”) discrete choice dataset (i.e., 1 row per respondent)?
Based on my reprex sample datasets below, it would seem that my desired dataset should contain at least the following variables:
- id - identifying respondent
- block - identifying survey block
- qes - identifying choice set/question the respondents faced
- alt - the alternative included in the choice set/question
- choice - the choice the respondent made (either alt 1 or alt 2)
- asc - alternative specific constant
- att.loc - level of attribute 1 used in alternative
- att.size - level of attribute 2 used in alternative
- gender
- income
MY SAMPLE DATASETS
# First: statistical design matrix. 1 row per alternative. Each question/choice set has 2 (unlabeled) alternatives.
# I show only first 3 questions/choice sets, i.e., 6 obs.
stat.design <- data.frame(block = c(1,1,1,1,1,1), # 4 blocks of respodnents. Each recieved 6 questions/choice set
qes = c(1,1,2,2,3,3), # identifies which of 24 different questions/choice sets from statistical design.
alt = c(1,2,1,2,1,2), # each respondent faced 2 alternatives in each question/choice set
asc = c(0,1,0,1,0,1), # alt specific constant
att.loc = c(0,1,1,0,1,1), # attribute 1: categorical variable
att.size = c(0,0,1,1,2,0)) # attribute 2: categorical variable
# Second:respondent data. 1 row per respondent. I show only first 5 respondents and only 2 choice sets (q1, q2)
resp.data <- data.frame(id = c(1,2,3,4,5), # respondent ID
block = c(1,2,1,1,1), # correponds to "block" in stat.design dataframe
q1 = c(1,2,2,1,1), # respondents choice to q1. 1=chosen, 2=not chosen
q2 = c(1,2,2,1,1), # respondents choice to q2. 1=chosen, 2=not chosen
gender= c(1,2,2,2,3),
income= c(1,1,5,3,5))