Hi R community,
today I installed the latest version of R (4.3.0) and R Studio and came under trouble with my code for a logistic regression.
- My code predicts types of leaders (A, B) on a dataset of 409 participants .
- When running the code in the older (4.3.2) version, I received results well above .73 for both train and test sets.
-However, after the download I receive .62 for the test set - which is obviously a very sharp decline. I did not change the code or the data set.
I have googled this topic extensively but could not find an answer to my question.
Here is the code for the test set (for train set almost identifical except "train" instead of "test"):
id_train <- sample(1:409, 286, replace = FALSE)
data_new$id_numeric <- 1:nrow(data_new)
train_cl2 <- subset(data_new, id_numeric %in% id_train)
test_cl2 <- subset(data_new, !id_numeric %in% id_train)
#Logistic regression model
glm.leadertype <-glm(Leadertype ~.,family="binomial",data=train_cl2)
#predicting on test data using model
glm.predict.leadertype <-predict(glm.leadertype, test_cl2, type='response')
#convert predicted values to categories
test_cl2$predict.Leadertype <- ifelse(glm.predict.leadertype >=.5, "category A", "category B")
#determine accuracy of model on test
accuracy_test <-mean(test_cl2$predict.Leadertype == test_cl2$Leadertype)
And here's the session info:
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
 LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8
 LC_NUMERIC=C LC_TIME=German_Germany.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
 stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
 compiler_4.3.0 tools_4.3.0 rstudioapi_0.14
I am very helpful for any kind of help and could also provide my dataset if necessary.
Thank you very much in advance!