hi how are you today

delete me because this post is not relevant thank you

Since no one was able to assist, I was able to determine what is going on here. This data set comes from a text that is pre-defied, and wants you to use a 'Rapid Miner' approach to logistic regression.
That being said, we are only able to provide enough analysis given the underlying data set. So we have:

Train <- ch9_training.csv
Test <- ch9_scoring.csv

predict test data proportions ------------------------------------------------------------ predict test set proportions

model <- glm(X2nd_Heart_Attack ~ Age + Marital_Status + Weight_Category, train, family = binomial )
test$X2nd_Heart_Attack <- predict(model, test, type = 'response')

test$X2nd_Heart_Attack <- ifelse(test$X2nd_Heart_Attack > 0.5, 1, 0 )
test %>% count(X2nd_Heart_Attack) %>% mutate(proportion = n / sum(n))
   X2nd_Heart_Attack   n proportion
 1                 0 334   0.484058
 2                 1 356   0.515942

tidymodels fit a model with workflow(model, recipe) ------------------ show certainty of probability

test$X2nd_Heart_Attack <- predict(model, test, type = 'response')
model2 <- workflow() %>%
 add_model(logistic_reg() %>% set_engine(engine = 'glm')) %>%
 add_recipe(recipe(X2nd_Heart_Attack ~ Age + Marital_Status + Weight_Category, data = train)) %>%
 fit(data = train)

full_set <- cbind(test, predict(model2, test, type = 'prob'))

full_set[1,]

91% confident that this man will not have a second heart attack. He's done stress management and 50% anxiety out of 100

Age Marital_Status Gender Weight_Category Cholesterol Stress_Management Trait_Anxiety X2nd_Heart_Attack  .pred_No  .pred_Yes
1  61              0      1               1         139                 1            50        0.08763056 0.9123694 0.08763056
full_set[11,]
Age Marital_Status Gender Weight_Category Cholesterol Stress_Management Trait_Anxiety X2nd_Heart_Attack    .pred_No .pred_Yes
11  66              2      1               2         220                 0            60         0.9906442 0.009355812 0.9906442
ggplot(test, aes(Age, X2nd_Heart_Attack, color = factor(Marital_Status)))+
 geom_count(show.legend = FALSE)+
 geom_line(lwd = 0.5)+
 scale_color_manual('Marital Status', values = c('#F98866','#89DA59','#80BD9E','#FF420E'),
                                      labels = c('single','widowed','married','divorced'))+
 facet_wrap(~Weight_Category,
            labeller = labeller(Weight_Category = c('0' = 'Normal',
                                                    '1' = 'Overweight',
                                                    '2' = 'Obese')))+
geom_smooth(method = 'glm', method.args = list(family = 'binomial'), formula = y ~ x, se = FALSE)+
  theme(axis.text = element_text(size=14),
       axis.title = element_text(size = 14),
       strip.text.x = element_text(size = 14),
       legend.key.size = unit(x = 4, units = 'line'),
       legend.text = element_text(size = 14),
       legend.title = element_text(size = 14))+
 ylab(label = 'Probability 2nd Heart Attack' )

This is pretty much all we can do, hard code the probability of heart attack on the training set as 1/0, or plot the probabilities

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.