How to add back existing columns to model after building

Alas, it can be very difficult to reverse engineer a problem without a reprex. It doesn't have to be all the data or even the same data, so long as the structure is the same. There are packages, such as {charalton} to generate fake data to substitute for the missing object dl.

If I substitute mtcars for dl, the next problem is createDataPartition, which is not in the namespace. Since I don't recognize the function off the top of my head, I'd have to go hunting for it, since the line

function "createDataPartition"

is malformed.

Looking at

  LogModel <- glm(Status ~ .,data=training,family=binomial, maxit=100)

I have to assume that Status is binary. Then I have to wonder what, after running glm and assigning the result to LogModel

  colnames(model_input_df)

is supposed to do, since glm objects don't have columns, and the return will be NULL. Then I have to wonder why LogModel is then overwritten by

  LogModel <- c(1, 2, 3, 4, 5,6,7,8,9)

which replaces the fitted model with a vector.

The creation of final_df makes syntactic sense

> final_df <- rbind(mtcars[, c(-1, -2,-3,-4,-5,-6,-7)], "pred_values" = LogModel)
> tail(final_df)
               vs am gear carb
Lotus Europa    1  1    5    2
Ford Pantera L  0  1    5    4
Ferrari Dino    0  1    5    6
Maserati Bora   0  1    5    8
Volvo 142E      1  1    4    2
pred_values     1  2    3    4

but only works because everything in the mtcars and LogModel is of the same class, numeric. The error message indicates that `ml isn't.

> is.factor(mtcars$mpg)
[1] FALSE

The reason for appending pred_value as a row to ml is doubly unclear because

  1. It's an arbitrary numeric, not the results of any model fit
  2. It's unclear how the augmented ml object is being used.

Then, if pred_value is supposed to be a predicted value of status in a logistic model, you would need to dig out the log likelihood. (See my post here, which is based on the standard text.) If, on the other hand, it's supposed to be the estimates of the independent variables, those need to be extracted from the model output. In either event, I don't think that I've ever seen either presented in the same table as the source data.

All of which is to say

  1. Try to make it as clear as possible what the goal is.
  2. Try to make it as easy as possible to help progress toward the goal.