# hi how are you today

delete me because this post is not relevant thank you

Since no one was able to assist, I was able to determine what is going on here. This data set comes from a text that is pre-defied, and wants you to use a 'Rapid Miner' approach to logistic regression.
That being said, we are only able to provide enough analysis given the underlying data set. So we have:

``````Train <- ch9_training.csv
Test <- ch9_scoring.csv
``````

predict test data proportions ------------------------------------------------------------ predict test set proportions

``````model <- glm(X2nd_Heart_Attack ~ Age + Marital_Status + Weight_Category, train, family = binomial )
test\$X2nd_Heart_Attack <- predict(model, test, type = 'response')

test\$X2nd_Heart_Attack <- ifelse(test\$X2nd_Heart_Attack > 0.5, 1, 0 )
test %>% count(X2nd_Heart_Attack) %>% mutate(proportion = n / sum(n))
X2nd_Heart_Attack   n proportion
1                 0 334   0.484058
2                 1 356   0.515942
``````

tidymodels fit a model with workflow(model, recipe) ------------------ show certainty of probability

``````test\$X2nd_Heart_Attack <- predict(model, test, type = 'response')
model2 <- workflow() %>%
add_model(logistic_reg() %>% set_engine(engine = 'glm')) %>%
add_recipe(recipe(X2nd_Heart_Attack ~ Age + Marital_Status + Weight_Category, data = train)) %>%
fit(data = train)

full_set <- cbind(test, predict(model2, test, type = 'prob'))

full_set[1,]
``````

91% confident that this man will not have a second heart attack. He's done stress management and 50% anxiety out of 100

``````Age Marital_Status Gender Weight_Category Cholesterol Stress_Management Trait_Anxiety X2nd_Heart_Attack  .pred_No  .pred_Yes
1  61              0      1               1         139                 1            50        0.08763056 0.9123694 0.08763056
full_set[11,]
Age Marital_Status Gender Weight_Category Cholesterol Stress_Management Trait_Anxiety X2nd_Heart_Attack    .pred_No .pred_Yes
11  66              2      1               2         220                 0            60         0.9906442 0.009355812 0.9906442
``````
``````ggplot(test, aes(Age, X2nd_Heart_Attack, color = factor(Marital_Status)))+
geom_count(show.legend = FALSE)+
geom_line(lwd = 0.5)+
scale_color_manual('Marital Status', values = c('#F98866','#89DA59','#80BD9E','#FF420E'),
labels = c('single','widowed','married','divorced'))+
facet_wrap(~Weight_Category,
labeller = labeller(Weight_Category = c('0' = 'Normal',
'1' = 'Overweight',
'2' = 'Obese')))+
geom_smooth(method = 'glm', method.args = list(family = 'binomial'), formula = y ~ x, se = FALSE)+
theme(axis.text = element_text(size=14),
axis.title = element_text(size = 14),
strip.text.x = element_text(size = 14),
legend.key.size = unit(x = 4, units = 'line'),
legend.text = element_text(size = 14),
legend.title = element_text(size = 14))+
ylab(label = 'Probability 2nd Heart Attack' )
``````

This is pretty much all we can do, hard code the probability of heart attack on the training set as 1/0, or plot the probabilities

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.