delete me because this post is not relevant thank you
Since no one was able to assist, I was able to determine what is going on here. This data set comes from a text that is pre-defied, and wants you to use a 'Rapid Miner' approach to logistic regression.
That being said, we are only able to provide enough analysis given the underlying data set. So we have:
Train <- ch9_training.csv
Test <- ch9_scoring.csv
predict test data proportions ------------------------------------------------------------ predict test set proportions
model <- glm(X2nd_Heart_Attack ~ Age + Marital_Status + Weight_Category, train, family = binomial )
test$X2nd_Heart_Attack <- predict(model, test, type = 'response')
test$X2nd_Heart_Attack <- ifelse(test$X2nd_Heart_Attack > 0.5, 1, 0 )
test %>% count(X2nd_Heart_Attack) %>% mutate(proportion = n / sum(n))
X2nd_Heart_Attack n proportion
1 0 334 0.484058
2 1 356 0.515942
tidymodels fit a model with workflow(model, recipe) ------------------ show certainty of probability
test$X2nd_Heart_Attack <- predict(model, test, type = 'response')
model2 <- workflow() %>%
add_model(logistic_reg() %>% set_engine(engine = 'glm')) %>%
add_recipe(recipe(X2nd_Heart_Attack ~ Age + Marital_Status + Weight_Category, data = train)) %>%
fit(data = train)
full_set <- cbind(test, predict(model2, test, type = 'prob'))
full_set[1,]
91% confident that this man will not have a second heart attack. He's done stress management and 50% anxiety out of 100
Age Marital_Status Gender Weight_Category Cholesterol Stress_Management Trait_Anxiety X2nd_Heart_Attack .pred_No .pred_Yes
1 61 0 1 1 139 1 50 0.08763056 0.9123694 0.08763056
full_set[11,]
Age Marital_Status Gender Weight_Category Cholesterol Stress_Management Trait_Anxiety X2nd_Heart_Attack .pred_No .pred_Yes
11 66 2 1 2 220 0 60 0.9906442 0.009355812 0.9906442
ggplot(test, aes(Age, X2nd_Heart_Attack, color = factor(Marital_Status)))+
geom_count(show.legend = FALSE)+
geom_line(lwd = 0.5)+
scale_color_manual('Marital Status', values = c('#F98866','#89DA59','#80BD9E','#FF420E'),
labels = c('single','widowed','married','divorced'))+
facet_wrap(~Weight_Category,
labeller = labeller(Weight_Category = c('0' = 'Normal',
'1' = 'Overweight',
'2' = 'Obese')))+
geom_smooth(method = 'glm', method.args = list(family = 'binomial'), formula = y ~ x, se = FALSE)+
theme(axis.text = element_text(size=14),
axis.title = element_text(size = 14),
strip.text.x = element_text(size = 14),
legend.key.size = unit(x = 4, units = 'line'),
legend.text = element_text(size = 14),
legend.title = element_text(size = 14))+
ylab(label = 'Probability 2nd Heart Attack' )
This is pretty much all we can do, hard code the probability of heart attack on the training set as 1/0, or plot the probabilities
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.