Hello guys, I'm new to programming, so please excuse the possible rookie mistakes.
I need to reproduce a logit analysis table from an academic paper as a training exercise. The stata code (.do file) reads as follows:
logit ONSET EXCL LOSS SIZE PRECON i.MARKERS ///
ONGOING lGDP lPOP PYS SPLINE ///
DEMOC ELF i.REGDUM ///
, nolog cluster(CTR)
All the variables are from a data frame that I imported through read_dta(). I managed to run a successful reproduction for all variables except the "i.REGDUM"-variable, which denominates world regions with the discrete manifestations 0-5.
library(haven) df <- read_dta("https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/JNVYN0/DR8FNE&version=1.0") mylogit4 <- glm(formula = ONSET ~ EXCL + LOSS + SIZE + PRECON + REL + LING + RACE + REG + ONGOING + lGDP + lPOP + PYS + SPLINE1 + SPLINE2 + SPLINE3 + DEMOC + ELF + REGDUM, family = "binomial", data = df)
As you can see, I replaced the "i.MARKERS"-variable with readily available variables "REL + LING + RACE + REG", which represent the layers of i.MARKERS.
It seems like indicator variables lose their "layer structure" when imported. What I get after importing is the "REGDUM"-variable:
summary(df$REGDUM) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 1.000 3.000 2.869 5.000 5.000
My next idea was to create separate variables for each layer of i.REGDUM analogous to REL + LING + RACE + REG and then run the logit analysis with those. This is where I'm stuck though, since I can't figure out how to create such variables and add them to the dataframe. I'd be very glad for help.
Thanks in advance!