Creating discrete choice dataset for mlogit / rchoice / mnlogit

(I have revised my question with “reprex” … I hope it make more sense now …)

I want to analyze discrete choice data from a panel dataset (i.e., individual ID + individual characteristics + choices made when presented with a choice set).

I have to create this discrete choice panel dataset from two “inputs” :
(1) a statistical design matrix. identifying choice sets shown to the respondent (described as alternatives with attributes)
(2) respondent data. identifying choices made + individual characteristics (gender, income, etc)

My question is how to combine these two inputs into a dataset that I can analyze with R packages such as e.g., “clogit” “mlogit” “rchoice” “mnlogit” etc. Specifically, I want to create a “long” format dataset (see e.g.,

In my reprex below I provide:
(1) a sample statistical design that is similar to mine
(2) a sample respondent dataset that is similar to mine

Can you help with the R code to create a (“long”) discrete choice dataset (i.e., 1 row per respondent)?

Based on my reprex sample datasets below, it would seem that my desired dataset should contain at least the following variables:

  • id - identifying respondent
  • block - identifying survey block
  • qes - identifying choice set/question the respondents faced
  • alt - the alternative included in the choice set/question
  • choice - the choice the respondent made (either alt 1 or alt 2)
  • asc - alternative specific constant
  • att.loc - level of attribute 1 used in alternative
  • att.size - level of attribute 2 used in alternative
  • gender
  • income


# First: statistical design matrix.  1 row per alternative.  Each question/choice set has 2 (unlabeled) alternatives.  
# I show only first 3 questions/choice sets, i.e.,  6 obs. <- data.frame(block = c(1,1,1,1,1,1), # 4 blocks of respodnents. Each recieved 6 questions/choice set 
                          qes = c(1,1,2,2,3,3), # identifies which of 24 different questions/choice sets from statistical design.  
                          alt = c(1,2,1,2,1,2), # each respondent faced 2 alternatives in each question/choice set
                          asc = c(0,1,0,1,0,1), # alt specific constant
                          att.loc = c(0,1,1,0,1,1),  # attribute 1: categorical variable
                          att.size = c(0,0,1,1,2,0)) # attribute 2: categorical variable
# Second:respondent data.  1 row per respondent. I show only first 5 respondents and only 2 choice sets (q1, q2) <- data.frame(id = c(1,2,3,4,5),    # respondent ID
                        block = c(1,2,1,1,1),   # correponds to "block" in dataframe    
                        q1    = c(1,2,2,1,1),   # respondents choice to q1.  1=chosen, 2=not chosen
                        q2    = c(1,2,2,1,1),   # respondents choice to q2.  1=chosen, 2=not chosen
                        gender= c(1,2,2,2,3), 
                        income= c(1,1,5,3,5))

Can you provide a reproducible example, it will make it easier for us to help you resolve the problem.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

hey, sorry rookie mistake. See my revised original post above. I have clarified and created a "reprex". Hope it makes more sense, thx in advance.