Confidence Interval: getting "NA" as an answer. Why?

Hello every one
I have a problem calculating my confidence interval. I made a polr regression and wanted to look at the confidence intervals of my explanatory variables. I get the confidence intervals for all variables except for the variable "Age" where I get as an answer "NA".
Does anyone know why I am getting an "NA" and how I can solve this problem?
(By the way my variable "Age" includes 92 observations, and the age of individuals is between 20-29 years old)

library(MASS)

CHL=as.factor(data$Challenge)
levels(CHL)=c("No affected","Moderate","Affected")

mo2=polr(formula=CHL ~ Age + Distance + gender + help, data, method = c("logistic"), Hess =TRUE)
summary(mo2)

(ci = confint(mo2))
>                 2.5 %        97.5 %
> Age                NA            NA
> Distance  0.000327371  0.0007957107
> gender   -0.419290911  1.3946316454
> help     -2.206255646 -0.4040815382

Thank you very much for your time and help.

I may have missed something but what is your date ? Where can we found data$Challenge ?

Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

Unfortunately the system doesn't allow me to upload my data. Even not just in pdf...

Individual Age gender Distance Challenge
1 22 1 1107 2
2 21 0 923 1
3 21 0 1107 1
4 20 0 1107 1
5 27 1 690 1
6 22 0 1367 1
7 24 1 1726 2
8 20 0 1107 2
9 23 0 1107 1
10 23 0 2381 2
11 22 1 1174 2
12 21 0 923 1
13 20 1 1107 2
14 24 0 690 1
15 28 0 7221 3
16 25 0 690 1
17 21 0 8890 3
18 22 1 785 2
19 23 0 1107 2
20 21 0 1107 1
21 21 0 1107 1
22 21 0 1107 1
23 23 0 690 2
24 24 1 1904 1
25 25 1 5930 3
26 25 0 529 1
27 26 0 4461 3
28 21 0 1183 1
29 22 1 5930 1
30 20 0 1298 2
31 24 1 614 1
32 25 1 3116 3
33 25 1 3421 2
34 26 1 4213 2
35 24 0 1679 1
36 23 1 8145 3
37 23 1 3116 2
38 22 1 765 1
39 22 0 1107 1
40 24 0 1866 2
41 24 1 5930 2
42 25 1 1054 1
43 23 1 5930 3
44 23 0 1183 2
45 22 1 1367 1
46 21 0 1367 1
47 24 1 1978 1
48 21 0 1107 1
49 24 1 8890 3
50 23 1 1107 2
51 23 0 1367 1
52 23 1 1183 2
53 23 0 8890 2
54 22 0 8890 3
55 23 1 1367 1
56 21 0 1107 1
57 21 0 4511 2
58 21 1 1107 1
59 25 1 1367 1
60 25 1 7630 2
61 24 0 1367 3
62 20 0 1367 2
63 21 0 1367 1
64 23 1 1317 2
65 21 0 923 1
66 20 0 1174 1
67 20 1 1367 2
68 22 1 5373 2
69 20 1 1107 2
70 24 0 1183 1
71 20 0 1367 1
72 21 0 1367 2
73 22 1 1726 1
74 23 0 1183 2
75 20 0 1904 1
76 22 0 2381 1
77 25 0 1183 1
78 21 1 8890 2
79 22 0 1183 2
80 26 1 1183 2
81 20 1 1107 2
82 25 0 1866 1
83 22 1 1367 1
84 20 1 1183 1
85 20 1 1298 1
86 20 0 1367 1
87 24 0 1183 1
88 23 1 1726 1
89 20 1 1367 3
90 25 0 1183 1
91 22 1 3421 3
92 29 1 3886 3

There is not help variable in this data but one in your formula.

Se also Best Practices: how to prepare your own data for use in a `reprex` if you can’t, or don’t know how to reproduce a problem with a built-in dataset?

You'll find some insight on how to provide data. However, what you did is ok. I was able to copy and use datapasta::tribble_paste to get a usable data for R.

test.pdf (42.7 KB)

I am so sorry Sir, this would be the data. I forget to insert the variable help.

can you please run summary on your data and show the output?

summary(data)

That will show us if there are any missing values in the source data and give us a better feel of what your full data set looks like.

of course. The summary looks as follows:

> summary(data)
   Individual         Age            gender          Distance      Challenge         help       
 Min.   : 1.00   Min.   :20.00   Min.   :0.0000   Min.   : 529   Min.   :1.00   Min.   :0.0000  
 1st Qu.:23.75   1st Qu.:21.00   1st Qu.:0.0000   1st Qu.:1107   1st Qu.:1.00   1st Qu.:0.0000  
 Median :46.50   Median :22.00   Median :0.0000   Median :1367   Median :1.00   Median :1.0000  
 Mean   :46.50   Mean   :22.61   Mean   :0.4674   Mean   :2345   Mean   :1.63   Mean   :0.5652  
 3rd Qu.:69.25   3rd Qu.:24.00   3rd Qu.:1.0000   3rd Qu.:2079   3rd Qu.:2.00   3rd Qu.:1.0000  
 Max.   :92.00   Max.   :29.00   Max.   :1.0000   Max.   :8890   Max.   :3.00   Max.   :1.0000

Thanks for the data. pdf is not the preferred format - but here is a full reprex for people who want to chime in

# get data
url <- "https://forum.posit.co/uploads/default/original/2X/9/9429c5c23fed9dc09ef30508b8d837930062d5c7.pdf"
pdf_temp <- tempfile(fileext = ".pdf")
download.file(url, pdf_temp, mode = "wb")

## extract from pdf file
library(pdftools)
library(tidyverse)
data <- pdftools::pdf_text(pdf_temp) %>%
  read_lines() %>%
  str_trim() %>%
  str_split_fixed("[ ]+", n = 6) %>%
  as_tibble() %>%
  set_names(nm = slice(., 1)) %>%
  slice(-1) %>%
  mutate_all(as.numeric)

## actual code ----

library(MASS)
#> 
#> Attachement du package : 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select

CHL=as.factor(data$Challenge)
levels(CHL)=c("No affected","Moderate","Affected")

mo2=polr(formula=CHL ~ Age + Distance + gender + help, data, method = c("logistic"), Hess =TRUE)
summary(mo2)
#> Call:
#> polr(formula = CHL ~ Age + Distance + gender + help, data = data, 
#>     Hess = TRUE, method = c("logistic"))
#> 
#> Coefficients:
#>               Value Std. Error t value
#> Age       0.1023205  0.0246094   4.158
#> Distance  0.0005437  0.0002367   2.297
#> gender    0.4871502  0.4608772   1.057
#> help     -1.2882294  0.4567059  -2.821
#> 
#> Intercepts:
#>                      Value    Std. Error t value 
#> No affected|Moderate   2.9482   0.0048   610.0403
#> Moderate|Affected      5.7471   0.5225    10.9983
#> 
#> Residual Deviance: 137.6569 
#> AIC: 149.6569

(ci = confint(mo2))
#> Waiting for profiling to be done...
#>                 2.5 %        97.5 %
#> Age                NA            NA
#> Distance  0.000327371  0.0007957107
#> gender   -0.419290911  1.3946316454
#> help     -2.206255646 -0.4040815382

Created on 2018-12-23 by the reprex package (v0.2.1)

I don't know very well polr model so not sure if NA is expected or not...

2 Likes

I think the profiling is failing on confint() for the Age variable. There's a diagnostic plot for the profile that you can do, showing the parameter tau for each coefficient. It has to span a wide enough range (given a specific confidence interval requested, like 0.95 or 0.9 etc) or else the interval can't be calculated. It looks like Age doesn't meet the criterion. Radically lowering the requested CI (e.g. to 0.4) will give you a result, so I don't think it's a bug—it's just that the quantity you wan't can't be estimated for this model. The reprex below is similar to @cderv's, but I had to adjust the cleaning to get things to work so I've included the whole thing again, in addition to the plot.

url <- "https://forum.posit.co/uploads/default/original/2X/9/9429c5c23fed9dc09ef30508b8d837930062d5c7.pdf"
pdf_temp <- tempfile(fileext = ".pdf")
download.file(url, pdf_temp, mode = "wb")

## extract from pdf file
library(pdftools)
library(tidyverse)
data <- pdftools::pdf_text(pdf_temp) %>%
  read_lines() %>%
  str_trim() %>%
    str_split_fixed("[ ]+", n = 6) %>%
    as.data.frame(, stringsAsFactors = FALSE) %>%
    as_tibble() %>%
  set_names(nm = map(slice(., 1), as.character)) %>%
  slice(-1) %>%
  mutate_all(as.numeric)

## actual code ----

library(MASS)
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select

data$CHL <- as.factor(data$Challenge)
levels(data$CHL) <- c("No affected","Moderate","Affected")

summary(data)
#>    Individual         Age            gender          Distance   
#>  Min.   : 1.00   Min.   :20.00   Min.   :0.0000   Min.   : 529  
#>  1st Qu.:23.75   1st Qu.:21.00   1st Qu.:0.0000   1st Qu.:1107  
#>  Median :46.50   Median :22.00   Median :0.0000   Median :1367  
#>  Mean   :46.50   Mean   :22.61   Mean   :0.4674   Mean   :2345  
#>  3rd Qu.:69.25   3rd Qu.:24.00   3rd Qu.:1.0000   3rd Qu.:2079  
#>  Max.   :92.00   Max.   :29.00   Max.   :1.0000   Max.   :8890  
#>    Challenge         help                 CHL    
#>  Min.   :1.00   Min.   :0.0000   No affected:47  
#>  1st Qu.:1.00   1st Qu.:0.0000   Moderate   :32  
#>  Median :1.00   Median :1.0000   Affected   :13  
#>  Mean   :1.63   Mean   :0.5652                   
#>  3rd Qu.:2.00   3rd Qu.:1.0000                   
#>  Max.   :3.00   Max.   :1.0000

mo2 <- polr(formula = CHL ~ Age + Distance + gender + help, data,
            method = c("logistic"), Hess =TRUE)

summary(mo2)
#> Call:
#> polr(formula = CHL ~ Age + Distance + gender + help, data = data, 
#>     Hess = TRUE, method = c("logistic"))
#> 
#> Coefficients:
#>               Value Std. Error t value
#> Age       0.1023205  0.0246094   4.158
#> Distance  0.0005437  0.0002367   2.297
#> gender    0.4871502  0.4608772   1.057
#> help     -1.2882294  0.4567059  -2.821
#> 
#> Intercepts:
#>                      Value    Std. Error t value 
#> No affected|Moderate   2.9482   0.0048   610.0403
#> Moderate|Affected      5.7471   0.5225    10.9983
#> 
#> Residual Deviance: 137.6569 
#> AIC: 149.6569

confint(mo2, level = 0.4)
#> Waiting for profiling to be done...
#>                   30 %          70 %
#> Age       0.0415768227  0.1630394619
#> Distance  0.0004831186  0.0006068731
#> gender    0.2457709578  0.7286012025
#> help     -1.5294930492 -1.0494528431

confint(mo2, level = 0.9)
#> Waiting for profiling to be done...
#>                    5 %          95 %
#> Age                 NA            NA
#> Distance  0.0003605271  0.0007520464
#> gender   -0.2723704121  1.2474223473
#> help     -2.0553019090 -0.5450240340

plot(profile(mo2))

Created on 2018-12-23 by the reprex package (v0.2.1)

3 Likes

Thank you very much Mr. Healy for your help. I appreciate it.

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.