Hello I faced a huge problem nowadays.
I made a logistic regression model to classify something. This is my code :
####### setting threshold value to convert dependent variable into 0 and 1(till this point, it is continuous form)
q = quantile(df$dependentvariable, 0.7)
df$exposure = ifelse(df$dependentvariable >= q, 1, 0) # if the value belongs to upper 30%, return 1 and if not, return 0
####### dividing dataset into train set and test set
part = caret :: createDataPartition(df$dependentvariable, p = 0.7)
idx = as.vector(part[[1]])
training = df[idx, ]
test = df[-idx, ] # 70% of data for train, 30% of data for test
####### model calibration
training_model = glm(dependentvariable ~ independent1 + independent2, data = training, family = binomial)
summary(training_model)
####### prediction
predict_model = predict(training_model, newdata = test, type = "response")
calculating model accuracy
tab = table(predict_model >= 0.7, test$dependentvariable)
accuracy = sum(diag(tab))/sum(tab)*100
The code worked, but I'm not sure if I did correctly because the accuracy calculated by the measure was only 70%...I hope it is not the problem of preprocessing or data itself. So I wanna figure out these.
First of all, two 0.7 in the code, is it right that giving same threshold value in those place?
Secondly, Did I code correctly? Is it right the way I putting training data and test data respectively?
Thank you.