# Confidence Interval: getting "NA" as an answer. Why?

#1

Hello every one
I have a problem calculating my confidence interval. I made a polr regression and wanted to look at the confidence intervals of my explanatory variables. I get the confidence intervals for all variables except for the variable "Age" where I get as an answer "NA".
Does anyone know why I am getting an "NA" and how I can solve this problem?
(By the way my variable "Age" includes 92 observations, and the age of individuals is between 20-29 years old)

``````library(MASS)

CHL=as.factor(data\$Challenge)
levels(CHL)=c("No affected","Moderate","Affected")

mo2=polr(formula=CHL ~ Age + Distance + gender + help, data, method = c("logistic"), Hess =TRUE)
summary(mo2)

(ci = confint(mo2))
``````
``````>                 2.5 %        97.5 %
> Age                NA            NA
> Distance  0.000327371  0.0007957107
> gender   -0.419290911  1.3946316454
> help     -2.206255646 -0.4040815382
``````

Thank you very much for your time and help.

0 Likes

#2

I may have missed something but what is your date ? Where can we found `data\$Challenge` ?

Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

0 Likes

#3

Unfortunately the system doesn't allow me to upload my data. Even not just in pdf...

0 Likes

#4
Individual Age gender Distance Challenge
1 22 1 1107 2
2 21 0 923 1
3 21 0 1107 1
4 20 0 1107 1
5 27 1 690 1
6 22 0 1367 1
7 24 1 1726 2
8 20 0 1107 2
9 23 0 1107 1
10 23 0 2381 2
11 22 1 1174 2
12 21 0 923 1
13 20 1 1107 2
14 24 0 690 1
15 28 0 7221 3
16 25 0 690 1
17 21 0 8890 3
18 22 1 785 2
19 23 0 1107 2
20 21 0 1107 1
21 21 0 1107 1
22 21 0 1107 1
23 23 0 690 2
24 24 1 1904 1
25 25 1 5930 3
26 25 0 529 1
27 26 0 4461 3
28 21 0 1183 1
29 22 1 5930 1
30 20 0 1298 2
31 24 1 614 1
32 25 1 3116 3
33 25 1 3421 2
34 26 1 4213 2
35 24 0 1679 1
36 23 1 8145 3
37 23 1 3116 2
38 22 1 765 1
39 22 0 1107 1
40 24 0 1866 2
41 24 1 5930 2
42 25 1 1054 1
43 23 1 5930 3
44 23 0 1183 2
45 22 1 1367 1
46 21 0 1367 1
47 24 1 1978 1
48 21 0 1107 1
49 24 1 8890 3
50 23 1 1107 2
51 23 0 1367 1
52 23 1 1183 2
53 23 0 8890 2
54 22 0 8890 3
55 23 1 1367 1
56 21 0 1107 1
57 21 0 4511 2
58 21 1 1107 1
59 25 1 1367 1
60 25 1 7630 2
61 24 0 1367 3
62 20 0 1367 2
63 21 0 1367 1
64 23 1 1317 2
65 21 0 923 1
66 20 0 1174 1
67 20 1 1367 2
68 22 1 5373 2
69 20 1 1107 2
70 24 0 1183 1
71 20 0 1367 1
72 21 0 1367 2
73 22 1 1726 1
74 23 0 1183 2
75 20 0 1904 1
76 22 0 2381 1
77 25 0 1183 1
78 21 1 8890 2
79 22 0 1183 2
80 26 1 1183 2
81 20 1 1107 2
82 25 0 1866 1
83 22 1 1367 1
84 20 1 1183 1
85 20 1 1298 1
86 20 0 1367 1
87 24 0 1183 1
88 23 1 1726 1
89 20 1 1367 3
90 25 0 1183 1
91 22 1 3421 3
92 29 1 3886 3
0 Likes

#5

There is not `help` variable in this data but one in your formula.

You'll find some insight on how to provide data. However, what you did is ok. I was able to copy and use `datapasta::tribble_paste` to get a usable data for R.

0 Likes

#6

test.pdf (42.7 KB)

I am so sorry Sir, this would be the data. I forget to insert the variable help.

0 Likes

#7

can you please run `summary` on your data and show the output?

``````summary(data)
``````

That will show us if there are any missing values in the source data and give us a better feel of what your full data set looks like.

0 Likes

#8

of course. The summary looks as follows:

``````> summary(data)
Individual         Age            gender          Distance      Challenge         help
Min.   : 1.00   Min.   :20.00   Min.   :0.0000   Min.   : 529   Min.   :1.00   Min.   :0.0000
1st Qu.:23.75   1st Qu.:21.00   1st Qu.:0.0000   1st Qu.:1107   1st Qu.:1.00   1st Qu.:0.0000
Median :46.50   Median :22.00   Median :0.0000   Median :1367   Median :1.00   Median :1.0000
Mean   :46.50   Mean   :22.61   Mean   :0.4674   Mean   :2345   Mean   :1.63   Mean   :0.5652
3rd Qu.:69.25   3rd Qu.:24.00   3rd Qu.:1.0000   3rd Qu.:2079   3rd Qu.:2.00   3rd Qu.:1.0000
Max.   :92.00   Max.   :29.00   Max.   :1.0000   Max.   :8890   Max.   :3.00   Max.   :1.0000
``````
0 Likes

#9

Thanks for the data. pdf is not the preferred format - but here is a full reprex for people who want to chime in

``````# get data
pdf_temp <- tempfile(fileext = ".pdf")

## extract from pdf file
library(pdftools)
library(tidyverse)
data <- pdftools::pdf_text(pdf_temp) %>%
str_trim() %>%
str_split_fixed("[ ]+", n = 6) %>%
as_tibble() %>%
set_names(nm = slice(., 1)) %>%
slice(-1) %>%
mutate_all(as.numeric)

## actual code ----

library(MASS)
#>
#> Attachement du package : 'MASS'
#> The following object is masked from 'package:dplyr':
#>
#>     select

CHL=as.factor(data\$Challenge)
levels(CHL)=c("No affected","Moderate","Affected")

mo2=polr(formula=CHL ~ Age + Distance + gender + help, data, method = c("logistic"), Hess =TRUE)
summary(mo2)
#> Call:
#> polr(formula = CHL ~ Age + Distance + gender + help, data = data,
#>     Hess = TRUE, method = c("logistic"))
#>
#> Coefficients:
#>               Value Std. Error t value
#> Age       0.1023205  0.0246094   4.158
#> Distance  0.0005437  0.0002367   2.297
#> gender    0.4871502  0.4608772   1.057
#> help     -1.2882294  0.4567059  -2.821
#>
#> Intercepts:
#>                      Value    Std. Error t value
#> No affected|Moderate   2.9482   0.0048   610.0403
#> Moderate|Affected      5.7471   0.5225    10.9983
#>
#> Residual Deviance: 137.6569
#> AIC: 149.6569

(ci = confint(mo2))
#> Waiting for profiling to be done...
#>                 2.5 %        97.5 %
#> Age                NA            NA
#> Distance  0.000327371  0.0007957107
#> gender   -0.419290911  1.3946316454
#> help     -2.206255646 -0.4040815382
``````

Created on 2018-12-23 by the reprex package (v0.2.1)

I don't know very well `polr` model so not sure if `NA` is expected or not...

2 Likes

#10

I think the profiling is failing on `confint()` for the `Age` variable. There's a diagnostic plot for the profile that you can do, showing the parameter `tau` for each coefficient. It has to span a wide enough range (given a specific confidence interval requested, like 0.95 or 0.9 etc) or else the interval can't be calculated. It looks like `Age` doesn't meet the criterion. Radically lowering the requested CI (e.g. to 0.4) will give you a result, so I don't think it's a bug—it's just that the quantity you wan't can't be estimated for this model. The reprex below is similar to @cderv's, but I had to adjust the cleaning to get things to work so I've included the whole thing again, in addition to the plot.

``````url <- "https://community.rstudio.com/uploads/default/original/2X/9/9429c5c23fed9dc09ef30508b8d837930062d5c7.pdf"
pdf_temp <- tempfile(fileext = ".pdf")

## extract from pdf file
library(pdftools)
library(tidyverse)
data <- pdftools::pdf_text(pdf_temp) %>%
str_trim() %>%
str_split_fixed("[ ]+", n = 6) %>%
as.data.frame(, stringsAsFactors = FALSE) %>%
as_tibble() %>%
set_names(nm = map(slice(., 1), as.character)) %>%
slice(-1) %>%
mutate_all(as.numeric)

## actual code ----

library(MASS)
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#>
#>     select

data\$CHL <- as.factor(data\$Challenge)
levels(data\$CHL) <- c("No affected","Moderate","Affected")

summary(data)
#>    Individual         Age            gender          Distance
#>  Min.   : 1.00   Min.   :20.00   Min.   :0.0000   Min.   : 529
#>  1st Qu.:23.75   1st Qu.:21.00   1st Qu.:0.0000   1st Qu.:1107
#>  Median :46.50   Median :22.00   Median :0.0000   Median :1367
#>  Mean   :46.50   Mean   :22.61   Mean   :0.4674   Mean   :2345
#>  3rd Qu.:69.25   3rd Qu.:24.00   3rd Qu.:1.0000   3rd Qu.:2079
#>  Max.   :92.00   Max.   :29.00   Max.   :1.0000   Max.   :8890
#>    Challenge         help                 CHL
#>  Min.   :1.00   Min.   :0.0000   No affected:47
#>  1st Qu.:1.00   1st Qu.:0.0000   Moderate   :32
#>  Median :1.00   Median :1.0000   Affected   :13
#>  Mean   :1.63   Mean   :0.5652
#>  3rd Qu.:2.00   3rd Qu.:1.0000
#>  Max.   :3.00   Max.   :1.0000

mo2 <- polr(formula = CHL ~ Age + Distance + gender + help, data,
method = c("logistic"), Hess =TRUE)

summary(mo2)
#> Call:
#> polr(formula = CHL ~ Age + Distance + gender + help, data = data,
#>     Hess = TRUE, method = c("logistic"))
#>
#> Coefficients:
#>               Value Std. Error t value
#> Age       0.1023205  0.0246094   4.158
#> Distance  0.0005437  0.0002367   2.297
#> gender    0.4871502  0.4608772   1.057
#> help     -1.2882294  0.4567059  -2.821
#>
#> Intercepts:
#>                      Value    Std. Error t value
#> No affected|Moderate   2.9482   0.0048   610.0403
#> Moderate|Affected      5.7471   0.5225    10.9983
#>
#> Residual Deviance: 137.6569
#> AIC: 149.6569

confint(mo2, level = 0.4)
#> Waiting for profiling to be done...
#>                   30 %          70 %
#> Age       0.0415768227  0.1630394619
#> Distance  0.0004831186  0.0006068731
#> gender    0.2457709578  0.7286012025
#> help     -1.5294930492 -1.0494528431

confint(mo2, level = 0.9)
#> Waiting for profiling to be done...
#>                    5 %          95 %
#> Age                 NA            NA
#> Distance  0.0003605271  0.0007520464
#> gender   -0.2723704121  1.2474223473
#> help     -2.0553019090 -0.5450240340

plot(profile(mo2))
``````

Created on 2018-12-23 by the reprex package (v0.2.1)

3 Likes

#11

Thank you very much Mr. Healy for your help. I appreciate it.

0 Likes

#12

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

0 Likes

closed #13

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

0 Likes