GLMER on MICE imputed data

Hendrina · March 25, 2020, 11:00am

Hie, I am trying to run a binomial logistic regression with fixed and random effects after imputing data (using mice) for three of the fixed effect variables. My data has ~ 1030 observations and looks like

> dpasta(head(IMPSC))
> data.frame(stringsAsFactors=FALSE,
+        MCLUSTER = c(344, 398, 140, 140, 140, 140),
+         MNUMBER = c(118, 113, 156, 89, 164, 219),
+             M01 = c(3, 3, 3, 3, 3, 6),
+         MREGION = c(3, 3, 1, 1, 1, 1),
+           MTYPE = c(2, 1, 2, 2, 2, 2),
+              ID = c(3070, 5000, 5001, 5002, 5005, 5006),
+             CRP = c(0.38, 0.27, 2.44, 1.07, 0.82, 1.36),
+            CRP1 = c(-0.967584026261706, -1.30933331998376, 0.89199803930511,
+                     0.0676586484738149, -0.198450938723838, 0.307484699747961),
+      CRP1decile = c(3L, 2L, 7L, 5L, 4L, 5L),
+             AGP = c(0.95, 0.67, 0.49, 0.62, 0.53, 1.2),
+            AGP1 = c(-0.0512932943875506, -0.400477566597125,
+                     -0.713349887877465, -0.478035800943, -0.63487827243597,
+                     0.182321556793955),
+      AGP1decile = c(4L, 2L, 1L, 2L, 1L, 6L),
+          RBPADJ = c(1.34182692107781, 1.07024733187185, 1.33, 1.11931875401758,
+                     1.02, 0.744077335961651),
+            zlen = c(0.05, -2.36, -3, -2.15, -1.43, -2.27),
+        stunting = c(0, 1, 1, 1, 0, 1),
+             sex = c(1, 2, 2, 1, 2, 1),
+       age_month = c(10, 27, 40, 32, 27, 19),
+        agegroup = c("a.6-12", "d.25-36", "e.37-59", "d.25-36", "d.25-36",
+                     "c.19-24"),
+             wgt = c(3.570834, 3.235285, 0.191198, 0.191198, 0.191198, 0.191198),
+      SUGAR_VITA = c(0.5225, NA, 1.122, 7.6615, 25.597, NA),
+        oil_vitA = c(3, NA, 4.99, 6.25, NA, 3),
+    RBPNEWstatus = c("ok", "ok", "ok", "ok", "ok", "ok"),
+           HV270 = c("1", "3", "1", "NA", "5", "5")
+ )

I imputed data for variables "SUGAR_VITA" (about 300 obs), "oil_vitA" (about 200 obs), and "HV270" (only 2 obs). For the imputation i used the code below

cols<-c("MCLUSTER","MNUMBER","M01","MREGION","MTYPE","ID","CRP1decile","AGP1decile","stunting","sex","RBPNEWstatus")
IMPSC[cols]<- lapply(IMPSC[cols], factor)
head(IMPSC)

install.packages("mice")
library(mice)
summary(IMPSC)

md.pattern(IMPSC)
install.packages("VIM")
library(VIM)
marginplot(IMPSC[c(20,21)], col=c("blue","red","orange"))

aggr_plot <- aggr(IMPSC, col=c('navyblue','red'), numbers=TRUE, sortVars=TRUE, labels=names(data), cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))


pbox(IMPSC,pos=14)

PSC1<- mice(IMPSC,m=5,maxit = 50,method = "pmm")
summary(PSC1)

#checking distribution of imputed values
PSC1$imp$HV270
xyplot(PSC1,SUGAR_VITA~RBPADJ+stunting+sex, pch=18, cex=1)

I then want to run a binomial logistic regression with stunting as the dependent variable and the code below is what i've come up with after looking around

model3<- pool(with(PSC1,glmer(stunting~RBPADJ+agegroup+sex+SUGAR_VITA+oil_vitA+(1|MCLUSTER),family = binomial(link="logit"))))

summary(model3)

The challenge is that the output only presents the main intercepts

             term   estimate std.error statistic       df
1     (Intercept) -3.0511369  0.471270  -6.47428  911.599
2          RBPADJ -0.3491726  0.244515  -1.42802 1022.879
3 agegroupb.13-18  0.3990366  0.364327   1.09527 1023.722
4 agegroupc.19-24  0.9998021  0.368061   2.71640 1022.441
5 agegroupd.25-36  1.3073237  0.317882   4.11260 1023.545
6 agegroupe.37-59  0.8975109  0.306736   2.92601 1023.392
7             sex  1.0117249  0.151334   6.68538 1012.724
8      SUGAR_VITA  0.0111669  0.014416   0.77463  167.912
9        oil_vitA  0.0017863  0.011450   0.15600   99.432
            p.value
1 0.000000000155471
2 0.153590840968638
3 0.273655390374537
4 0.006710987842431
5 0.000042259512833
6 0.003509263444354
7 0.000000000037992
8 0.439648798112383
9 0.876346788197479

and not the whole glmer summary as i'd want and need to use to build my model (because after evaluating the output for this model, I will then add "HV270" in the next stage.

Any suggestions on how I can approach this
How do I center the variables for sugar and oil, seeing as their data is imputed 5 times for each?

system · April 15, 2020, 11:00am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.