Hie, I am trying to run a binomial logistic regression with fixed and random effects after imputing data (using mice) for three of the fixed effect variables. My data has ~ 1030 observations and looks like
> dpasta(head(IMPSC))
> data.frame(stringsAsFactors=FALSE,
+ MCLUSTER = c(344, 398, 140, 140, 140, 140),
+ MNUMBER = c(118, 113, 156, 89, 164, 219),
+ M01 = c(3, 3, 3, 3, 3, 6),
+ MREGION = c(3, 3, 1, 1, 1, 1),
+ MTYPE = c(2, 1, 2, 2, 2, 2),
+ ID = c(3070, 5000, 5001, 5002, 5005, 5006),
+ CRP = c(0.38, 0.27, 2.44, 1.07, 0.82, 1.36),
+ CRP1 = c(-0.967584026261706, -1.30933331998376, 0.89199803930511,
+ 0.0676586484738149, -0.198450938723838, 0.307484699747961),
+ CRP1decile = c(3L, 2L, 7L, 5L, 4L, 5L),
+ AGP = c(0.95, 0.67, 0.49, 0.62, 0.53, 1.2),
+ AGP1 = c(-0.0512932943875506, -0.400477566597125,
+ -0.713349887877465, -0.478035800943, -0.63487827243597,
+ 0.182321556793955),
+ AGP1decile = c(4L, 2L, 1L, 2L, 1L, 6L),
+ RBPADJ = c(1.34182692107781, 1.07024733187185, 1.33, 1.11931875401758,
+ 1.02, 0.744077335961651),
+ zlen = c(0.05, -2.36, -3, -2.15, -1.43, -2.27),
+ stunting = c(0, 1, 1, 1, 0, 1),
+ sex = c(1, 2, 2, 1, 2, 1),
+ age_month = c(10, 27, 40, 32, 27, 19),
+ agegroup = c("a.6-12", "d.25-36", "e.37-59", "d.25-36", "d.25-36",
+ "c.19-24"),
+ wgt = c(3.570834, 3.235285, 0.191198, 0.191198, 0.191198, 0.191198),
+ SUGAR_VITA = c(0.5225, NA, 1.122, 7.6615, 25.597, NA),
+ oil_vitA = c(3, NA, 4.99, 6.25, NA, 3),
+ RBPNEWstatus = c("ok", "ok", "ok", "ok", "ok", "ok"),
+ HV270 = c("1", "3", "1", "NA", "5", "5")
+ )
I imputed data for variables "SUGAR_VITA" (about 300 obs), "oil_vitA" (about 200 obs), and "HV270" (only 2 obs). For the imputation i used the code below
cols<-c("MCLUSTER","MNUMBER","M01","MREGION","MTYPE","ID","CRP1decile","AGP1decile","stunting","sex","RBPNEWstatus")
IMPSC[cols]<- lapply(IMPSC[cols], factor)
head(IMPSC)
install.packages("mice")
library(mice)
summary(IMPSC)
md.pattern(IMPSC)
install.packages("VIM")
library(VIM)
marginplot(IMPSC[c(20,21)], col=c("blue","red","orange"))
aggr_plot <- aggr(IMPSC, col=c('navyblue','red'), numbers=TRUE, sortVars=TRUE, labels=names(data), cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))
pbox(IMPSC,pos=14)
PSC1<- mice(IMPSC,m=5,maxit = 50,method = "pmm")
summary(PSC1)
#checking distribution of imputed values
PSC1$imp$HV270
xyplot(PSC1,SUGAR_VITA~RBPADJ+stunting+sex, pch=18, cex=1)
I then want to run a binomial logistic regression with stunting as the dependent variable and the code below is what i've come up with after looking around
model3<- pool(with(PSC1,glmer(stunting~RBPADJ+agegroup+sex+SUGAR_VITA+oil_vitA+(1|MCLUSTER),family = binomial(link="logit"))))
summary(model3)
The challenge is that the output only presents the main intercepts
term estimate std.error statistic df
1 (Intercept) -3.0511369 0.471270 -6.47428 911.599
2 RBPADJ -0.3491726 0.244515 -1.42802 1022.879
3 agegroupb.13-18 0.3990366 0.364327 1.09527 1023.722
4 agegroupc.19-24 0.9998021 0.368061 2.71640 1022.441
5 agegroupd.25-36 1.3073237 0.317882 4.11260 1023.545
6 agegroupe.37-59 0.8975109 0.306736 2.92601 1023.392
7 sex 1.0117249 0.151334 6.68538 1012.724
8 SUGAR_VITA 0.0111669 0.014416 0.77463 167.912
9 oil_vitA 0.0017863 0.011450 0.15600 99.432
p.value
1 0.000000000155471
2 0.153590840968638
3 0.273655390374537
4 0.006710987842431
5 0.000042259512833
6 0.003509263444354
7 0.000000000037992
8 0.439648798112383
9 0.876346788197479
and not the whole glmer summary as i'd want and need to use to build my model (because after evaluating the output for this model, I will then add "HV270" in the next stage.
- Any suggestions on how I can approach this
- How do I center the variables for sugar and oil, seeing as their data is imputed 5 times for each?