Assessing model performance: Likelihood Ratios

Hi all,

I saw some of the papers regarding predictive model performance using something called " likelihood ratio analysis". Below is an example:

" The likelihood ratios (LRs) of cut-offs for a positive test defined using the population within each risk group were calculated. The following categories for interpretation of the LRs were used: informative (LR<0·1 or >10); moderately informative (LR 0·1–0·2 or 5–10); and non-informative (LR 0·2–5)." Reference: A Risk Prediction Model for the Assessment and Triage of Women with Hypertensive Disorders of Pregnancy in Low-Resourced Settings: The miniPIERS (Pre-eclampsia Integrated Estimate of RiSk) Multi-country Prospective Cohort Study

Can someone pls teach me how this is done (e.g. what R package I should use and how I can build my code/syntax)?

Thank you in advance!

Where to begin depends on level of experience with logistic regression, which, at a minimum, should be at the level described in this post.

Begin by downloading the data and constructing a simple glm model and calculating odds ratios.

Yes, I am familiar with logistic regression (e.g. odds ratio). What I don't understand is how I can define the high vs low predicted probability cutoff in my data. Any advice is much appreciated.

Again, I suggest downloading the data and making a model for each group. Then post a reprex regarding questions on calculating the LRs.

Likelihood ratio analysis is a way to compare two models, especially if the models are nested. For example, if model 1 has terms A and B and model 2 just has A, a likelihood ratio test (LRT) gets the likelihood for each model and compares them.

The likelihood can be thought of as a measure of how well the parameters fit the data. As an analogy, in simple linear regression, the likelihood is basically the sums of squared errors (SSE). I'll use that as an example below but the likelihood for other distributions are not as intuitive but basically work the same way.

For the two models, suppose the model 1 SSE is 10.0 and the model 2 SSE is 12.0. Since the model is "slightly better" when predictor B is included, you might think that model 1 is better. The question is how much better is good enough.

The math behind the LRT looks at the change in the likelihood relative to the difference in the number of model terms that were different (a.k.a. the degrees of freedom). If B is numeric, the is one less parameter in model 2 so it is a single degree of freedom test. With this, and the sample size, we can see if a difference of 2.0 is significant (say via a p-value).

That's a conceptual overview. For R code, I would suggest starting with Faraway's book.

2 Likes

Hi Max, do you think there are two types of likelihood ratios? I understand the type you described but it seems to be different from the one used in the referenced article in my original question (e.g. how does the informative category fit into your example)?

Do you know how to select the predicted probability in the risk stratification table in the paper I cited in my original post (see Table 4)? Is the selection based on clinical knowledge?

No, I don't. I haven't tried to adapt the code in the paper.