How to run survivalROC over multiple variables at once.

I have ~100 variables on a spreadsheet that I would need to determine the best ROC cutoff and the AUC for 2-year OS, using the survivalROC package. To clarify, I need to determine the best ROC cutoff/AUC for each variable individually.

I'm using the code below to run ROC for one variable, but I'm not sure how to use a loop or something similar to run this for all 100 variables without copying and pasting the code 100x. Would anyone be able to help me? Thank you so much!

ROC.cutoff24<-survivalROC(Stime=data$Time_OS,
                        status=data$OS,
                        marker=-data$Variable_1,
                        predict.time=24,
                        method = 'KM')
ROC.cutoff24$cut.values[which.max(ROC.cutoff24$TP-ROC.cutoff24$FP)]
ROC.cutoff24[["AUC"]]

99 others like this?

@technocrat. Yes, that's correct! 99 similar variables that are on a spreadsheet.

And it's already imported into the data frame data? (BTW: naming user objects after built-ins like data and df will sooner or later throw an error because namespace precedence will try to subset the function, rather than the data frame and yield the mysterious "cannot subset a closure" error.

@technocrat. Yes, the variable list is already imported into the data frame "data" as a column. Sorry, the dataframe is actually named something else, but I changed the name for privacy reasons when posting the question to this forum.

That's ok, just a warning at large. If you can't suitably anonomize the data to create a reprex. See the FAQ, I'll use the mayo dataset to demonstrate.

Thanks @technocrat! This is a sample of my dataset, except this is with random numbers.

Pt_number OS Time_OS Variable_1 Variable_2 Variable_3 Variable_4
1 1 13 29 82 1 48
2 1 25.8 28 62 98 13
3 1 2 96 21 33 16
4 1 6.7 57 100 17 68
5 1 3 28 14 18 15
6 0 60 26 52 19 63
7 1 2 4 42 95 57
8 1 15 37 43 4 100
9 0 0.6 5 84 86 15
10 1 11 2 8 25 12
11 1 24 97 39 44 39
12 0 12.5 23 42 39 21
13 1 20 47 99 54 62
14 1 71 21 70 52 73
15 1 53.2 16 47 8 2
16 1 27 42 63 74 68
17 0 33 45 12 86 37
18 1 25.2 44 22 88 38
19 1 9 60 66 33 49
20 1 12 53 73 19 7

Help me out. Is this intended?

Are you trying to lag the observations by row?

If we can fix the reprex below, it won't be hard to convert to a function to create a list of survivalROC objects to sweep in all the variables in a one liner.

library(survivalROC)

data <- data.frame(
  Pt_number = c(
    1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
    11, 12, 13, 14, 15, 16, 17, 18, 19, 20
  ),
  OS = c(
    1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1,
    0, 1, 1, 1, 1, 0, 1, 1, 1
  ),
  Time_OS = c(
    13, 25.8, 2, 6.7, 3, 60, 2, 15, 0.6, 11, 24,
    12.5, 20, 71, 53.2, 27, 33, 25.2, 9, 12
  ),
  Variable_1 = c(
    29, 28, 96, 57, 28, 26, 4, 37, 5,
    2, 97, 23, 47, 21, 16, 42, 45, 44, 60, 53
  ),
  Variable_2 = c(
    82, 62, 21, 100, 14, 52, 42, 43,
    84, 8, 39, 42, 99, 70, 47, 63, 12, 22, 66, 73
  ),
  Variable_3 = c(
    1, 98, 33, 17, 18, 19, 95, 4, 86,
    25, 44, 39, 54, 52, 8, 74, 86, 88, 33, 19
  ),
  Variable_4 = c(
    48, 13, 16, 68, 15, 63, 57, 100,
    15, 12, 39, 21, 62, 73, 2, 68, 37, 38, 49, 7
  )
)

ROC.cutoff24<-survivalROC(Stime=data$Time_OS,
                          status=data$OS,
                          marker=-data$variable_1,
                          predict.time=24,
                          method = 'KM')
#> Error in -data$variable_1: invalid argument to unary operator
ROC.cutoff24$cut.values[which.max(ROC.cutoff24$TP-ROC.cutoff24$FP)]
#> Error in eval(expr, envir, enclos): object 'ROC.cutoff24' not found
ROC.cutoff24[["AUC"]]
#> Error in eval(expr, envir, enclos): object 'ROC.cutoff24' not found

Created on 2023-01-28 with reprex v2.0.2

Hi @technocrat . Thank you so much for helping me out! I'm confused by your question. Each variable is a separate column so it's more accurate to say that I want to run this function over columns Variable_1 to Variable_4. I'm trying to determine the best ROC cutoff point with Variable_1, the best ROC cutoff point with Variable_2, etc. I also capitalized "Variable_1" in my original code blow to keep the capitalizations consistent. I think that was probably throwing the error.

                    marker=-data$variable_1,

@technocrat. I'm trying to find the best ROC cutoff over all the patients for Variable_1, the best ROC cutoff over all the patients for Variable_2, etc. So would like to iterate through columns Variable_1 to Variable_4.
Also, I think that was throwing an error because I didn't capitalize "Variable_1" in my original code while I did in the data set. Thanks again for all the help!

Ok, so the function works with the correction?

Yes, it definitely works with just one variable like this.

1 Like

OK, I'm having problems getting it to work, so I'll pass it back to you.

x <- dat$variable_1
get_co24 <- function(x) {
  o <- survivalROC(
    Stime = d$Time_OS,
    status = d$OS,
    marker = x,
    predict.time = 24,
    method = "KM"
  )
  o$cut.values[which.max(ROC.cutoff24$TP - ROC.cutoff24$FP)]
  o[["AUC"]]
}

then

l <- lapply(4:7, get_co24)

will get you the list of survivalROC objects.

Report back:?

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.