Generalised Linear Model - Response error is constant; boundary (singular) fit; leading leading minor of order 4 is not positive definite

Hello all,

I am trying to run the code below in order to simulate P-values using a generalised linear model . (I am running a logistic regression for power analysis):

#which_p_value = "x1"
which_p_value = "groupcategory"
#which_p_value = "x1:groupcategory"

run_anova = FALSE 
simulate_mixed_effect = TRUE 
mixed_effect_sd = 3.094069
mixed_effect_sd_slope = 3.098661

library(tidyverse)
n_people <- c(2,5,10,15,20)
coef1 <- 1.61
coef2 <- -0.01
#coef3 <- 5
#coef4 <- 0

g1 = 0
g2 = 1
g3 = 2 
distances <- c(60,90,135,202.5,303.75,455.625)/100
n_trials <- 35
oneto1000 <- 25
n_track_lengths <- length(distances)
groupcategory = c(rep(g1, n_track_lengths), rep(g2, n_track_lengths),rep(g3,n_track_lengths))
z = c(n_people)
emptydataframeforpowerplots = NULL

coef3s <- c(-5, -4, -3, -2,-1, 0, 1, 2, 3, 4, 5)
coef4s <- c(-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1)

Datarray <- array(dim=c(length(coef3s), length(coef4s),length(n_people)))

coef3_counter =1
for (coef3 in coef3s) {
  coef4_counter =1
  for (coef4 in coef4s) {
    z1_g2 <- coef1 + coef2*distances + coef3*g2 + coef4*g2*distances
    z1_g3 <- coef1 + coef2*distances + coef3*g3 + coef4*g3*distances
    d = NULL
    pr1 = 1/(1+exp(-z1_g2))
    pr2 = 1/(1+exp(-z1_g3))

    counter=1
    for (i in n_people) {
      for (j in 1:oneto1000){
        df <- c()
        for (k in 1:i){

          # random effect from drawing a random intercept with sd = x

          if (simulate_mixed_effect){
            coef1_r = rnorm(1, mean=coef1, sd=mixed_effect_sd)
            coef2_r = rnorm(1, mean=coef1, sd=mixed_effect_sd_slope)
          } else {
            coef1_r = coef1
            coef2_r = coef2
          }

          z_g1 <- coef1_r + coef2*distances + coef3*g1 + coef4*g1*distances
          pr = 1/(1+exp(-z_g1))

          z1_g2 <- coef1_r + coef2*distances + coef3*g2 + coef4*g2*distances
          pr1 = 1/(1+exp(-z1_g2))

          if (run_anova) {
            df <- rbind(df, data.frame(x1 = c(rep(distances, 3)),
                                     y = c(rbinom(n_track_lengths,n_trials,pr), rbinom(n_track_lengths,n_trials,pr1),rbinom(n_track_lengths,n_trials,pr2)),
                                     groupcategory = groupcategory, id = c(rep(k,18))))

          } else { # this is for glmer data organisation
            for (m in 1:n_trials) {
            df <- rbind(df, data.frame(x1 = c(rep(distances, 3)),
 y = c(rbinom(n_track_lengths,1,pr),rbinom(n_track_lengths,1,pr1),rbinom(n_track_lengths,1,pr2)),groupcategory = groupcategory,id = c(rep(k,18))))

            }
          }
        }

        if (run_anova) {
          #df_aov <- aov(y~x1*groupcategory+Error(id/(x1*groupcategory)),data=df)
          #df_aov_sum <- summary(df_aov)
          #pvalue <- df_aov_sum[[5]][[1]][which_p_value,"Pr(>F)"]

          df_aov <- aov(y~x1*groupcategory+Error(id),data=df)
          df_aov_sum <- summary(df_aov)
          pvalue <- df_aov_sum[[2]][[1]][which_p_value, "Pr(>F)"]

        } else { # glmer
          mod_group_glmer <-  glmer(y ~ x1 + groupcategory + (1+x1|id), data = df, family = "binomial")
          sum <- summary(mod_group_glmer)
          pvalue <- sum$coefficients[which_p_value, "Pr(>|z|)"]
        }

        d = rbind(d,data.frame(pvalue))
      }
      count <- plyr::ldply(d,function(c) sum(c<=0.05))
      Datarray[coef3_counter,coef4_counter,counter] <- count$V1/oneto1000
      counter = counter +1
      d = NULL
    }
    coef4_counter = coef4_counter + 1
  }
  coef3_counter = coef3_counter + 1
}

Everytime I try to run the code...

I am getting the error: Error: response is constant.

Additionally, the code seems to take a very long time to run before it then prints the error...

Does anybody know why?

The debugger gives me this script:

Lapack routine dgesv: system is exactly singular: U[6,6] = 0

Which from my understanding could either indicate collinearity or another problem such as when a random effect variance is estimated to be very near zero and thus very loosely to the data that it is not sufficiently informative to drag the estimate away from the zero starting value.

Additionally, I get the error:

the leading minor of order 4 is not positive definite

And a message along the lines of:

not positive definite or contains NA values: falling back to var-cov estimated from RXModel is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

Here is a snippet of my data and the glmer output:

 df
         x1 y groupcategory id
1   0.60000 1             0  1
2   0.90000 1             0  1
3   1.35000 1             0  1
4   2.02500 1             0  1
5   3.03750 1             0  1
6   4.55625 1             0  1
7   0.60000 0             1  1
8   0.90000 0             1  1
9   1.35000 0             1  1
10  2.02500 0             1  1
11  3.03750 0             1  1
12  4.55625 0             1  1
13  0.60000 0             2  1
14  0.90000 0             2  1
15  1.35000 0             2  1
16  2.02500 0             2  1
17  3.03750 0             2  1
18  4.55625 0             2  1
19  0.60000 1             0  1
20  0.90000 0             0  1
21  1.35000 1             0  1
22  2.02500 1             0  1
23  3.03750 0             0  1
24  4.55625 1             0  1
25  0.60000 0             1  1
26  0.90000 0             1  1
27  1.35000 0             1  1
28  2.02500 0             1  1
29  3.03750 0             1  1
30  4.55625 0             1  1
31  0.60000 0             2  1
32  0.90000 0             2  1
33  1.35000 0             2  1
34  2.02500 0             2  1
35  3.03750 0             2  1
36  4.55625 0             2  1

mod_group_glmer <-  glmer(y ~ x1 + groupcategory + (1+x1|id), data = df, family = "binomial")

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: y ~ x1 + groupcategory + (1 + x1 | id)
   Data: df
      AIC       BIC    logLik  deviance  df.resid 
 3648.796  3693.445 -1818.398  3636.796     12594 
Random effects:
 Groups Name        Std.Dev. Corr 
 id     (Intercept) 4.0727        
        x1          0.3478   -0.95
Number of obs: 12600, groups:  id, 20
Fixed Effects:
  (Intercept)             x1  groupcategory  
       1.2671        -0.1408        -6.8005  

Thanks for the reprex. Only perfect, just missing :grin:

library(lme4)
``
I see preceding errors 

> boundary (singular) fit: see ?isSingular

Have you looked into that? (Program is still running 10 minutes later)

I had a look , and it seems to me that sometimes the data being passed, doesnt have more than one value for y ( the model target), so its not plausible that any model would be found. maybe something like this to check, and skip the impossible to fit scenarios ?

{ # glmer
          #check if there are more than 1 value of y for each groupcategory and id combination
          checkme <- df %>% group_by(groupcategory,id) %>% summarise(miny=min(y),maxy=max(y)) %>% mutate(expectfail = miny==maxy)
          if(any(checkme$expectfail==TRUE)) {
            print("skipping because")
            print(checkme)
          } else {
          mod_group_glmer <-  glmer(y ~ x1 + groupcategory + (1+x1|id), data = df, family = "binomial")
          sum <- summary(mod_group_glmer)
          pvalue <- sum$coefficients[which_p_value, "Pr(>|z|)"]
          }
        }
1 Like

I second @nirgrahamuk's suggestion that the root cause may be the data, rather than the modeling. The suggestion to filter the data and re-run offers the possibility of re-factoring the model around those edge cases or, more likely, reducing the data or acquiring more. This seems like we're edging away from debugging back toward domain expertise in design and data acquisition.

In any event, a 3-day debugging cycle is going to take a very long time to find a cause. Could you trim back the number of iterations?

@technocrat I am currently running it now and I am getting lots of

@nirgrahamuk I am currently running it now and I am getting lots of 'skipping because':

Yes. I think you are saying you are surprised at the data you generated...
You haven't explained anything to us about your motivations/reasons for doing any of this, nor how you chose the details of your data creation. You merely asked for technical explanations as to why you get certain errors. I found one example. But what does it mean for what you are doing or want to understand... I have no idea.

Edited to say: I do think its odd that you get a skipping because message, but then nothing of checkme seems to have printed... when I ran your code with my amendment I got the checkme details that justified the skipping.

1 Like

@nirgrahamuk it still worked in the end after I made the ammendment you suggested!

@technocrat

1 Like

@nirgrahamuk could you please help me come to a solution. The images above are printing but the program is freezing every time I run the code. This is very frustrating and I would appreciate your advice :slight_smile:

@technocrat I'd appreciate whatever guidance you have too. The program seems to freeze/get delayed, as if it's overwhelmed with information. I tried running it only using a sample size of 2 and 15 to reduce the amount of information and the images printed much quicker. However, the "code" was still running and seemed to get frozen/delayed, so I wasn't able to see the output (Values) of the arrays/GLM.

This is what my screen is currently displaying:

@nirgrahamuk

Conceptually what is this work about?
What are you trying to demonstrate?
As far as your data generation approach, how did you decide on it? Do you have other options?

1 Like

not sure how to go about that? There have been other times when the function has worked without the error I'm getting above, even when it has said 'boundary (singular) fit: see? is singular

@technocrat

The code is still running on my admittedly under-powered macOS. The boundary singular error suggests, but I can't yet check, that the data argument may be the cause and that with other data the return value might not throw the response is constant error.

I'll try to run this on my more powerful Linux box and report back.

Ten minutes after the 50ths of these

boundary (singular) fit: see ?isSingular

appeared, all my cores were pegging out. I CTRL-C'd the routine and got five

1: In vcov.merMod(object, use.hessian = use.hessian) :
variance-covariance matrix computed from finite-difference Hessian is not positive
definite or contains NA values: falling back to var-cov estimated from RX

and ran

❯ rlang::last_error()
<error/rlang_error>
Lapack routine dgesv: system is exactly singular: U[6,6] = 0
Backtrace:
 1. lme4::glmer(...)
 2. lme4::optimizeGlmer(...)
 3. lme4:::optwrap(...)
 7. nM$newf(fn(nM$xeval()))
 8. base::stopifnot(length(value <- as.numeric(value)) == 1L)

With the same code given, you are seeing something other?

@technocrat thanks so much! I look forward to hearing back from you!

1 Like

@technocrat

Yes, I was seeing that message before, however I have carried out some debugging and I feel I have got closer to the source of the problem:

The debugger gives me this script:

Lapack routine dgesv: system is exactly singular: U[6,6] = 0

Which from my understanding could either indicate collinearity or another problem such as when a random effect variance is estimated to be very near zero and thus very loosely to the data that it is not sufficiently informative to drag the estimate away from the zero starting value.

When I specify the model as mod_group_glmer <- glmer(y ~ x1 + groupcategory + (1|id), data = df, family = "binomial") rather than mod_group_glmer <- glmer(y ~ x1 + groupcategory + (1+x1|id), data = df, family = "binomial")....

I no longer get the warning message: boundary (singular) fit: see ?isSingular

Thus, I feel that my issue may relate to the way I have specified the model using the glmer formula. When I consider a (1|ID) term which accounts for overall, across-distance ('x1') variation around the linear trend with respect to the proportion of correct trials ('y'), I no longer receive the above warning.

Thus, I will now try running the model again with that new specification and see what happens and get back to you!

Now I get the error you got: the leading minor of order 4 is not positive definite


unable to evaluate scaled gradientModel failed to converge: degenerate  Hessian with 1 negative eigenvaluesvariance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RXvariance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RXunable to evaluate scaled gradientModel failed to converge: degenerate  Hessian with 1 negative eigenvaluesvariance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RXvariance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RXModel is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

@technocrat I have edited the question above to include my glmer output!

1 Like