Hi All,
Kindly help me with a question. My colleagues and I are running the code below using the train function. We are using the most up to date R Studio version, most update to date R version, and all of our libraries are the newest version. We get different results. For example, the highest accuracy I get is with MTRY of 50 in bold below. When they run the same code, they get an MTRY of 100 as having the highest accuracy. I'm sure you will get 100 when you run the code below. What explains the difference?
Quick notes: takes about 5 minutes for the script to run. I have also provided my session info
library(dslabs)
library(rpart)
data("tissue_gene_expression")
set.seed(1991)
x <- tissue_gene_expression$x
y <- tissue_gene_expression$y
set.seed(1991)
fit <- with(tissue_gene_expression,
train(x, y, method = "rf",
nodesize = 1, tuneGrid = data.frame(mtry = seq(50, 200, 25))))
fit$results
mtry Accuracy Kappa AccuracySD KappaSD
#1 50 0.9969167 0.9963149 0.008020859 0.009554839
#2 75 0.9940750 0.9928824 0.010599365 0.012700530
#3 100 0.9953688 0.9944118 0.010914864 0.013159516
#4 125 0.9955857 0.9946730 0.011409477 0.013757460
#5 150 0.9939767 0.9927032 0.014178379 0.017140433
#6 175 0.9921557 0.9905142 0.015418882 0.018628942
#7 200 0.9921557 0.9904997 0.014873157 0.017973473
R 3.6 has messed with set.seed (https://github.com/wch/r-source/blob/8c1c78a/src/library/base/man/Random.Rd#L173-L175 ) I haven't run across a solution. If you have a 3.5.3 host, run the same code; dollars to donuts you won't see the problem.
Unfortunately our results differ regardless of version of R. I upgraded to 3.6 hoping to fix the issue.
Here's what I get with successive runs
fit$results
mtry Accuracy Kappa AccuracySD KappaSD
1 50 0.9964757 0.9957840 0.009039221 0.01078329
2 75 0.9963612 0.9956487 0.009195473 0.01096989
3 100 0.9958712 0.9950407 0.009294355 0.01111398
4 125 0.9948835 0.9938393 0.011299499 0.01358540
5 150 0.9951198 0.9940812 0.014035389 0.01697160
6 175 0.9945401 0.9933828 0.014124440 0.01707680
7 200 0.9926509 0.9911178 0.015151838 0.01826565
fit$results
mtry Accuracy Kappa AccuracySD KappaSD
1 50 0.9964757 0.9957840 0.009039221 0.01078329
2 75 0.9963612 0.9956487 0.009195473 0.01096989
3 100 0.9958712 0.9950407 0.009294355 0.01111398
4 125 0.9948835 0.9938393 0.011299499 0.01358540
5 150 0.9951198 0.9940812 0.014035389 0.01697160
6 175 0.9945401 0.9933828 0.014124440 0.01707680
7 200 0.9926509 0.9911178 0.015151838 0.01826565
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rpart_4.1-15 dslabs_0.5.2 caret_6.0-84 ggplot2_3.1.1 lattice_0.20-38
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 pillar_1.3.1 compiler_3.6.0 gower_0.2.0 plyr_1.8.4
[6] iterators_1.0.10 class_7.3-15 tools_3.6.0 ipred_0.9-9 lubridate_1.7.4
[11] tibble_2.1.1 nlme_3.1-139 gtable_0.3.0 pkgconfig_2.0.2 rlang_0.3.4
[16] Matrix_1.2-17 foreach_1.4.4 prodlim_2018.04.18 e1071_1.7-1 stringr_1.4.0
[21] withr_2.1.2 dplyr_0.8.0.1 generics_0.0.2 recipes_0.1.5 stats4_3.6.0
[26] grid_3.6.0 nnet_7.3-12 tidyselect_0.2.5 data.table_1.12.2 glue_1.3.1
[31] R6_2.4.0 survival_2.44-1.1 lava_1.6.5 reshape2_1.4.3 purrr_0.3.2
[36] magrittr_1.5 ModelMetrics_1.2.2 scales_1.0.0 codetools_0.2-16 MASS_7.3-51.4
[41] splines_3.6.0 randomForest_4.6-14 assertthat_0.2.1 timeDate_3043.102 colorspace_1.4-1
[46] stringi_1.4.3 lazyeval_0.2.2 munsell_0.5.0 crayon_1.3.4
>
system
Closed
May 27, 2019, 4:39am
5
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.