How to resolve an issue of large confidence intervals while running CoxPH analysis?

Adam52 · May 28, 2023, 3:09pm

Hello, I am running into an issue while performing CoxPH analysis using the following sample dataset:

structure(list(Systemic.Tx...2.classification..Chemotherapy..PD1.monotherapy..PD.1.CTLA.4.combo..PD.1.chemo..targetted.Tx..targetted.chemo.combo..etc.
 = c("Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted/chemo combo", "Targetted Tx", "Targetted Tx",  "Targetted Tx"), Time.on.systemic.Tx =
 c("2.069815195", "2.332648871",  "2.069815195", "1.215605749",
 "2.661190965", "0.689938398", "1.839835729",  "2.858316222",
 "0.657084189", "2.529774127", "1.80698152", "3.482546201", 
 "2.891170431", "3.515400411", "2.431211499", "3.515400411",
 "1.347022587",  "5.519507187", "17.47843943", "26.90759754",
 "6.176591376", "5.979466119",  "8.246406571", "15.40862423",
 "5.749486653", "6.242299795", "5.683778234",  "6.636550308",
 "10.15195072", "10.0862423", "18.52977413", "5.749486653", 
 "10.7761807", "6.965092402"), PFS2 = c(2.595482546, 2.37, 2.069815195, 
1.412731006, 1.938398357, 0.657084189, 2.529774127, 3.219712526, 
 0.657084189, 2.529774127, 2.2, 3.482546201, 2.529774127, 3.712525667, 
 2.234086242, 3.778234086, 1.347022587, 5.55, 17.3798768, 30.32443532, 
 7.12936345, 7.09650924, 8.246406571, 15.24435318, 5.519507187, 
 5.749486653, 5.420944559, 6.636550308, 9.264887064, 10.02053388, 
 18.20123203, 6.110882957, 10.61190965, 6.866529774), PFS2_event = c(1,  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1,  1, 1,
 1, 1, 1, 0, 1, 1, 0, 1, 1, 1), Binarised_Time.on.Tx.2 = c("≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months")), row.names = c(NA, -34L), class =
 "data.frame")

And here is the code I am using for this analysis:

fit1 <- coxph(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
Test_Dataset) summary(fit1)

I receive the following warning after running this code:

Warning message: In coxph.fit(X, Y, istrat, offset, init, control,
weights = weights, : Loglik converged before variable 1 ;
coefficient may be infinite.

And more importantly I am receiving incorrect results, since the confidence interval goes from 0 to Inf and the co-efficient and p-values are really high. I have ran this analysis for Overall Survival using the same dataset which has worked well without any issues. Any suggestions as to what might be driving this issue with respect to my PFS2 values?

Thank you!

technocrat · May 29, 2023, 9:15am

I can take a look if you retry the dput This doesn't look right

> str(d)
'data.frame':   34 obs. of  5 variables:
 $ Systemic.Tx...2.classification..Chemotherapy..PD1.monotherapy..PD.1.CTLA.4.combo..PD.1.chemo..targetted.Tx..targetted.chemo.combo..etc.: chr  "Targetted Tx" "Targetted Tx" "Targetted Tx" "Targetted Tx" ...
 $ Time.on.systemic.Tx: chr  "2.069815195" "2.332648871" "2.069815195" "1.215605749" ...
 $ PFS2: num  2.6 2.37 2.07 1.41 1.94 ...
 $ PFS2_event: num  1 1 1 1 1 1 1 1 1 1 ...
 $ Binarised_Time.on.Tx.2: chr  "≤ 3.52\nmonths" "≤ 3.52 months" "≤ 3.52 months" "≤ 3.52 months" ...

d[1] is obviously malformed. I'd also expect typeof(d[2]) to be numeric (although that may be part of the mess in d[1] and d[5]) to be a factor (seems unlikely that character strings would be helpful as the right hand term in the formula, especially with newline characters).

Adam52 · May 29, 2023, 9:34am

Thanks, just revised my code in my aforementioned question. I think I might be facing the Hauck-Donner effect and was wondering if there still might be a way to perform a CoxPH analysis using a binary method as mentioned in my code?

technocrat · May 29, 2023, 10:13am

The dput() wasn't the problem I thought it was, and I can reproduce the result. I did adjust the treatment variable, which had some inconsistent coding and that had the effect of reducing the number of coefficients to just on. The coefficient seems, however, quite large, and results in 10 orders of magnitude difference, which could cause the CI calculation to crap out due to floating point.

As a matter of form, the arguments to Surv() and coxph are consistent with those given in the examples. I can only think, then, that there is something about the data or the way it was recorded that is at fault.

Try running my notes here

library(survival)
d <- structure(list(Systemic.Tx...2.classification..Chemotherapy..PD1.monotherapy..PD.1.CTLA.4.combo..PD.1.chemo..targetted.Tx..targetted.chemo.combo..etc.
= c("Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx",  "Targetted Tx", "Targetted/chemo combo", "Targetted Tx", "Targetted Tx",  "Targetted Tx"), 
Time.on.systemic.Tx =
c("2.069815195", "2.332648871",  "2.069815195", "1.215605749",
"2.661190965", "0.689938398", "1.839835729",  "2.858316222",
"0.657084189", "2.529774127", "1.80698152", "3.482546201", 
"2.891170431", "3.515400411", "2.431211499", "3.515400411",
"1.347022587",  "5.519507187", "17.47843943", "26.90759754",
"6.176591376", "5.979466119",  "8.246406571", "15.40862423",
"5.749486653", "6.242299795", "5.683778234",  "6.636550308",
"10.15195072", "10.0862423", "18.52977413", "5.749486653", 
"10.7761807", "6.965092402"), 
PFS2 = c(2.595482546, 2.37, 2.069815195, 
1.412731006, 1.938398357, 0.657084189, 2.529774127, 3.219712526, 
0.657084189, 2.529774127, 2.2, 3.482546201, 2.529774127, 3.712525667, 
2.234086242, 3.778234086, 1.347022587, 5.55, 17.3798768, 30.32443532, 
7.12936345, 7.09650924, 8.246406571, 15.24435318, 5.519507187, 
5.749486653, 5.420944559, 6.636550308, 9.264887064, 10.02053388, 
18.20123203, 6.110882957, 10.61190965, 6.866529774), PFS2_event = c(1,  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1,  1, 1,
                                                               1, 1, 1, 0, 1, 1, 0, 1, 1, 1), 
                                                               Binarised_Time.on.Tx.2 = c("≤ 3.52
months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
months",  "> 3.52 months")), row.names = c(NA, -34L), class =
"data.frame")

# discard unused variables for convenience
d <- d[-c(1:2)]
# and rename right hand side of formula
# for convenience
colnames(d)[3] <- "bin_time"
# repair bin_time
newline <- "\n"
d$bin_time <- gsub(newline," ",d$bin_time)
d$bin_time |> unique()
# conform punctuation
pat <- "2m"
d$bin_time <- gsub(pat,"2 m",d$bin_time)
# reproduce issue but with fewer spurious coefficients
fit1 <- coxph(Surv(PFS2, PFS2_event) ~ bin_time, data = d) 
summary(fit1)
# Inf is due to value of coefficient for bin_time ≤ 3.52 months
(cf <- summary(fit1)$coefficients[1])
# 18 orders of magnitude different
exp(cf)
exp(-cf)

# look at arguments
(s <- Surv(d$PFS2,d$PFS2_event)) 
table(d$bin_time)

# how does our Surv object compare to example?
# aml dataset
head(aml)
# our time variable is fractional
# what if we make it integers?
d$PFS2 <- round(d$PFS2,0)
fit1 <- coxph(Surv(PFS2, PFS2_event) ~ bin_time, data = d) 
summary(fit1)
# no difference
# convert treatment variable to numeric represention?
d$bin_time <- ifelse(d$bin_time == "≤ 3.52 months",0,1)
fit1 <- coxph(Surv(PFS2, PFS2_event) ~ bin_time, data = d) 
summary(fit1)
# no difference either

system · July 10, 2023, 10:13am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.