length of Inverse Mills ratio is different from the original dataset

Following a probit regression, I am trying to calculate the inverse mills ratio following this instruction: invMillsRatio function - RDocumentation

However, I cannot mutate my dataset with the "IMR1" output of the invMillsRatio() command, as the number of observations in the dataset and the outcome of invMillsRatio() is different. In the R documentation above, both have the same length.

Can someone help me understand why the length of the dataset and invMillsRatio() 's output is different? (5600 vs. 1571 rows)

My code:

library(stats) # for probit regression
library(sampleSelection) # for inverse Mills ratio

surviveprobit <- glm(survive ~ ldnpt_1+ldrst_1+ldinv_1,
family = binomial(link = "probit"),
data = pfe1)
summary(surviveprobit)

inverse Mills ratio

temp <- invMillsRatio(surviveprobit)
pfe1$imr1 <- invMillsRatio(surviveprobit)$IMR1

Error:

Error in $<-.data.frame(*tmp*, imr1, value = c(9.71300209403677e-12, :
replacement has 1571 rows, data has 5600

I realized that the difference in number of observations exactly equals the number of rows with missing values in the data set. I dropped the missing values and repeated the code. This time it worked:

library(stats) # for probit regression
library(sampleSelection) # for inverse Mills ratio

surviveprobit <- glm(survive ~ ldnpt_1+ldrst_1+ldinv_1,
family = binomial(link = "probit"),
data = pfe1)

pfe2 <- drop_na(pfe1)
pfe2$imr1 <- invMillsRatio(surviveprobit)$IMR1 # inverse Mills ratio

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.