Hello
I try to perform a latent class analysis on my data from a discrete choice experiment. The respondents needed to chose between 2 options with as attributes: the number of children they prefer, and the educational level they prefer for their children (stated as a mixture of the number of children). The first rows of my data look like this:
Respondent Block Choice card Chosen FNoPrimary FPrimary FSecondary FTertiary MNoPrimary
1 1 1 1 0.0000000 0.0000000 0.00 0.0000000 0.0000000
1 1 1 0 0.3333333 0.6666667 0.00 0.0000000 0.0000000
1 2 12 0 0.3333333 0.3333333 0.00 0.0000000 0.0000000
1 2 12 1 0.1666667 0.0000000 0.00 0.3333333 0.1666667
1 3 2 0 0.0000000 0.0000000 1.00 0.0000000 0.0000000
1 3 2 1 0.0000000 0.0000000 0.25 0.0000000 0.0000000
MPrimary MSecondary MTertiary NChildren Age District Religion Indigenous Ethnic group Sex
1 0 1.00 0.0000000 1 18 0 Protestant 0 Wolaita Female
2 0 0.00 0.0000000 3 18 0 Protestant 0 Wolaita Female
3 0 0.00 0.3333333 9 18 0 Protestant 0 Wolaita Female
4 0 0.00 0.3333333 12 18 0 Protestant 0 Wolaita Female
5 0 0.00 0.0000000 1 18 0 Protestant 0 Wolaita Female
6 0 0.25 0.5000000 4 18 0 Protestant 0 Wolaita Female
Educational level Studentornot Farmerornot Marital status Having children Ever used contraception
1 High school - grade 10 1 0 0 0 0
2 High school - grade 10 1 0 0 0 0
3 High school - grade 10 1 0 0 0 0
4 High school - grade 10 1 0 0 0 0
5 High school - grade 10 1 0 0 0 0
6 High school - grade 10 1 0 0 0 0
Alternative
1 1
2 2
3 1
4 2
5 1
6 2
I looked at all the packages available in R and I think that only the gmnl package can handle my type of data and is able to add covariates. However, if I compare the output of my latent class analysis for a simple linear model with only 2 covariates (age and district) (as stated below), I become a totally different output then when I perform the same analysis with Stata (see code below).
in R:
defining_data <- mlogit.data(final_data_alternativeadded, id.var = "Respondent", choice = "Chosen", alt.var = "Alternative", chid.var="Choice.card", group.var = "Block", varying = 7:15, shape = "long")
mnl <- gmnl(Chosen ~ 1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + z | 0 | 0 | 0 | Age + District, data = defining_data, model = 'lc', Q = 3)
summary(mnl)
in Stata:
ssc install lclogit
ssc install fmlogit
lclogit chosen fprimary fsecondary ftertiary mnoprimary mprimary msecondary mtertiary block nchildren, group(choicecard) id(respondent) nclasses(3) membership(age district)
I tried to make all my variables numeric, to multiply the mixture proportions by the number of children to get values which are closer to each other, to order my dataset based on the value of the number of choice cards per respondents... but I always get other values for the latent class probabilities. Does someone know why?
Thank you very much in advance
Kind regards
Eva