data simulation from IRT 2PL to IRT 3PL model

superyinuo · July 27, 2020, 8:59pm

Hello,
Below is the code for generating dichotomous data based on IRT 2PL model. I want to generate dichotomous data using IRT 3PL model, how should I incorporate the c parameter in the code. thanks!

twopl.sim <- function( nitem = 20, npers = 100 ) {

i.loc <- rnorm( nitem )
p.loc <- rnorm( npers )
i.slp <- rlnorm( nitem, sdlog = .4 )

temp <- matrix( rep( p.loc, length( i.loc ) ), ncol = length( i.loc ) )

logits <- t( apply( temp , 1, '-', i.loc) )
logits <- t( apply( logits, 1, '*', i.slp) )

probabilities <- 1 / ( 1 + exp( -logits ) )

resp.prob <- matrix( probabilities, ncol = nitem)

obs.resp <- matrix( sapply( c(resp.prob), rbinom, n = 1, size = 1), ncol = length(i.loc) )

output <- list()
output$i.loc <- i.loc
output$i.slp <- i.slp
output$p.loc <- p.loc
output$resp <- obs.resp

output
}

wjakethompson · July 27, 2020, 10:07pm

Hi! The 3PL IRT model is usually defined as shown here.

First, I we can define a function, irt_prob that calculates the probability of a correct response, with or without c parameters. Then we pass that function to one more apply() function. The full code is below:

irt_prob <- function(logit, c = NULL) {
  if (is.null(c)) {
    1 / ( 1 + exp( -logit ) )
  } else {
    c + ((1 - c) / (1 + exp( -logit ) ) )
  }
}

irt.sim <- function( nitem = 20, npers = 100 ) {
  
  i.loc <- rnorm( nitem )
  p.loc <- rnorm( npers )
  i.slp <- rlnorm( nitem, sdlog = .4 )
  i.gus <- rbeta( nitem, shape1 = 5, shape2 = 17)
  
  temp <- matrix( rep( p.loc, length( i.loc ) ), ncol = length( i.loc ) )
  
  logits <- t( apply( temp , 1, '-', i.loc ) )
  logits <- t( apply( logits, 1, '*', i.slp ) )
  
  # For 2PL: 
  # probabilities <- t( apply( logits, 1, irt_prob ) )
  
  # For 3PL:
  probabilities <- t( apply( logits, 1, irt_prob, c = i.gus ) )
  
  resp.prob <- matrix( probabilities, ncol = nitem)

  obs.resp <- matrix( sapply( c(resp.prob), rbinom, n = 1, size = 1), ncol = length(i.loc) )
  
  output <- list()
  output$i.loc <- i.loc
  output$i.slp <- i.slp
  output$p.loc <- p.loc
  output$resp <- obs.resp
  
  output
}

I've generated the c-parameters from a beta(5,17) distribution, which will give an average value of ~.2.

superyinuo · July 28, 2020, 1:09pm

Thanks much for the help!
I have two more questions need your help.

If I want my c parameter average value is ~.25, that is, the items are 4 options multiple choice items, how should I modify the beta distribution?

2.using the code you helped, I then want to do parameter estimation using the following code, but there is warning message as below, what should I do to improve the code so that to achieve stable estimation?

fit3PL<-tpm(sim.resp, type="latent.trait",IRT.param=TRUE)

Warning message:
In tpm(sim.resp, type = "latent.trait", IRT.param = TRUE) :
Hessian matrix at convergence contains infinite or missing values; unstable solution.

Much thanks again!

wjakethompson · July 28, 2020, 1:56pm

The Beta distribution has a mean of shape1 / (shape1 + shape2). So, for example, beta(10, 30) would give you a mean around .25. There are obviously many combinations of shape1 and shape2 that will give you a mean around .25 (e.g., beta(1,3), beta(50,150), etc.). The magnitude of shape1 and shape2 relate to the variance. The full formula can be seen here, but in general, the bigger the numbers, the more narrow the distribution. See, for example, beta(10,30) compared to beta(100,300). Both give an expected value around .25, but have very different ranges of plausible values they draw from.

It's hard to know for sure, as I'm not familiar with the tpm() function. I generally use the mirt package or brms/rstan. My guess is that your sample is a little small. As a rule of thumb, I usually aim for 500-1000 respondents for a 3PL model. Another problem I've run into before is an item with 0 variance (e.g., all of the simulated responses are 0). If everyone gets an item wrong, then it may look "infinitely difficult". So I would add a check to your function to make sure after generating the item responses, no item is either all correct or all incorrect. If any of those items exist, then re-generate the responses.

Hopefully this helps!

superyinuo · July 28, 2020, 2:36pm

Thanks much for the teaching!
A big thumb up for you!!

system · August 4, 2020, 2:36pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.