Spline transformation (predictor) in the logistic regression

I am working on a regression model and hoping to incorporate spline transformation for my continuous predictor (pre-score). However, I am struggling with incorporating the spline transformation into the logistic regression (using disease as an outcome). Any advice on how I should fix my code below is greatly appreciated.

Here is how I attempted my spline transformed logistic regression:

knots <- quantile(data_B$Pre.score, p = c(0.25, 0.5, 0.75))
# Build the model
library(splines)
model <- glm (disease ~ bs(Pre.score, knots = knots), data = data_B, family=binomial)
# Make predictions
pred.val.2 <- predict(model, type ="response")
# Model performance
library(caret)
data.frame(
  RMSE = RMSE(pred.val.2, data_B$disease),
  R2 = R2(pred.val.2, data_B$disease)
)

Here is my data:

structure(list(Pre.score = c(15.87301587, 5.310939628, 
5.707491082, 3.089700997, 41.27569847, 4.200567644, 14.30503889, 
6.699928724, 4.148471616, 3.70212766, 19.41605839, 11.99368753, 
5.991232343, 4.42804428, 20.77562327, 3.432367595, 1.1574886, 
4.186655037, 29.27974948, 0.874453467, 13.66459627, 0.564652739, 
13.06160259, 19.66829268, 0, 8.623969562, 13.18289786, 3.505734689, 
4.769475358, 0.208768267, 1.23683005, 11.11934766, 0.567000567, 
4.077429984, 1.179835538, 2.582781457, 18.62888102, 0.540151242, 
9.014084507, 2.714285714, 15.17327505, 17.41935484, 11.01306036, 
13.53987378, 7.38645816, 18.69688385, 14.6287403), 
    disease = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
    1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 
    1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("none", "disease"
    ), class = "factor")), row.names = c(NA, 47L), class = "data.frame")

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.

For example, there are multiple packages with RMSE functions etc.

Also, you appear to be fitting a classification model but evaluating the model using regression performance metrics.

Thanks, Max! I have updated (e.g., specify which package I used) my original post. Does it help you better understand my question? :slight_smile:

Also, can you pls elaborate on your last comment re: "you appear to be fitting a classification model but evaluating the model using regression performance metrics"? What do you mean?

So you are fitting a logistic regression model for classifification.

RMSE and R2 are performance metrics for cases when the outcome data is quantitative. There are a lot of metrics that are more appropriate, such as the area under the ROC curve, the binomial log likelihood, and (to a lesser extent) classification accuracy.

Yes, I realized that as well! I will fix it. However, how about the other parts of my R code? For example, are thess parts correctly entered?

# Build the model
library(splines)
model <- glm (disease ~ bs(Pre.score, knots = knots), data = data_B, family=binomial)
# Make predictions
pred.val.2 <- predict(model, type ="response")

Hi Max - I am following up re: the spline transformation.
Do you think there is anything incorrect about the following code:

model <- glm (disease ~ bs(percentage, knots = knots), data = data_B, family=binomial)

Below is the output for another predictor (percentage) from the above code. Can you pls help me intepret it?

Coefficients:
                                 Estimate Std. Error z value Pr(>|z|)
(Intercept)                        0.4845     1.1982   0.404    0.686
bs(percentage, knots = knots)1     2.7183     3.3578   0.810    0.418
bs(percentage, knots = knots)2    -1.7655     2.3966  -0.737    0.461
bs(percentage, knots = knots)3     2.1134     3.1838   0.664    0.507
bs(percentage, knots = knots)4    -6.1535    10.9851  -0.560    0.575
bs(percentage, knots = knots)5    -8.1928    56.9180  -0.144    0.886
bs(percentage, knots = knots)6 10636.9163  8119.2865   1.310    0.190

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 61.513  on 46  degrees of freedom
Residual deviance: 51.817  on 40  degrees of freedom
AIC: 65.817

Number of Fisher Scoring iterations: 14

I do not know. The code seems syntactically correct but we have no idea about the purpose of this code. We can't tell if you are trying to unscrew something with a butterknife (as an analogy).

Even if the code is correct, the size of the estimate and standard error for the sixth spline parameter is excessive.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.