reproducing stata logit analysis in Rstudio

Hello guys, I'm new to programming, so please excuse the possible rookie mistakes.

I need to reproduce a logit analysis table from an academic paper as a training exercise. The stata code (.do file) reads as follows:

Model 4
logit ONSET EXCL LOSS SIZE PRECON i.MARKERS ///
ONGOING lGDP lPOP PYS SPLINE
///
DEMOC ELF i.REGDUM ///
, nolog cluster(CTR)

All the variables are from a data frame that I imported through read_dta(). I managed to run a successful reproduction for all variables except the "i.REGDUM"-variable, which denominates world regions with the discrete manifestations 0-5.

library(haven)
df <- read_dta("https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/JNVYN0/DR8FNE&version=1.0")
mylogit4 <- glm(formula = ONSET ~ EXCL + LOSS + SIZE + PRECON + 
    REL + LING + RACE + REG + ONGOING + lGDP + lPOP + PYS + SPLINE1 + 
    SPLINE2 + SPLINE3 + DEMOC + ELF + REGDUM, family = "binomial", 
    data = df)

As you can see, I replaced the "i.MARKERS"-variable with readily available variables "REL + LING + RACE + REG", which represent the layers of i.MARKERS.

It seems like indicator variables lose their "layer structure" when imported. What I get after importing is the "REGDUM"-variable:

summary(df$REGDUM)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.000   3.000   2.869   5.000   5.000 

My next idea was to create separate variables for each layer of i.REGDUM analogous to REL + LING + RACE + REG and then run the logit analysis with those. This is where I'm stuck though, since I can't figure out how to create such variables and add them to the dataframe. I'd be very glad for help.

Thanks in advance!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.