Applied Machine Learning Workshop - rstudio::conf 2020

Applied Machine Learning Workshop

1/27/20—1/28/20
9:00 AM-5:00 PM
2 Day Workshop
Continental Ballroom Rooms 4 (Ballroom Level)

Davis Vaughan
Software Engineer
RStudio

Max Kuhn
Applied Machine Learning
RStudio

Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast-growing fields of research in the world of data science. This two-day course will provide an overview of using R for supervised learning. The session will step through the process of building, visualizing, testing, and comparing models that are focused on prediction. The goal of the course is to provide a thorough workflow in R that can be used with many different regression or classification techniques. Case studies on real data will be used to illustrate the functionality and several different predictive models are illustrated. The course focuses on both low- and high-level approaches to modeling using the tidyverse and uses several types of models for illustration. Basic familiarity with R and the tidyverse is required.

11 Likes

The notes can be found at https://github.com/rstudio-conf-2020/applied-ml.

Please post any questions or issues with installations here.

4 Likes

I'm looking forward to the workshop. I'm just curious if it would be possible to bring an external monitor to use for the exercises?

Yes, that's fine. You might want to come a little early to set it up (and find a spot where there is power - those are sometimes sparse).

1 Like

tune failed at install from both github and CRAN. Workthrough below:

Error from CRAN:

Warning message:
package ‘tune’ is not available (for R version 3.6.2) 

Error from github:

Downloading GitHub repo tidymodels/tidymodels@master
Skipping 21 packages not available: broom, cli, crayon, dials, dplyr, ggplot2, infer, magrittr, parsnip, pillar, purrr, recipes, rlang, rsample, rstudioapi, tibble, tidytext, tidypredict, tidyposterior, workflows, yardstick
Downloading GitHub repo tidymodels/tune@master
Skipping 18 packages not available: dplyr, rlang, tibble, purrr, dials, recipes, ggplot2, glue, cli, crayon, yardstick, rsample, tidyr, GPfit, foreach, parsnip, workflows, hardhat
   checking DESCRIPTION meta-information ...
Installing package into ‘/Users/Chris/Library/R/3.6/library’
(as ‘lib’ is unspecified)
ERROR: dependencies ‘GPfit’, ‘workflows’, ‘hardhat’ are not available for package ‘tune’
* removing ‘/Users/Chris/Library/R/3.6/library/tune’
Error: Failed to install 'tidymodels' from GitHub:
  Failed to install 'tune' from GitHub:
  (converted from warning) installation of package ‘/var/folders/ql/8bhpzkz551d30l6s30lj6zd00000gp/T//Rtmpvw9LHW/file22193ff4b658/tune_0.0.1.tar.gz’ had non-zero exit status

I manually installed GPfit and worksflows (which installed hardhat) and then reran devtools::install_github("tidymodels/tune") and it worked.

Here is my sessionInfo():

sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3        magrittr_1.5     
 [3] usethis_1.5.1     devtools_2.2.1   
 [5] pkgload_1.0.2     R6_2.4.1         
 [7] rlang_0.4.2       fansi_0.4.1      
 [9] tools_3.6.2       pkgbuild_1.0.6   
[11] sessioninfo_1.1.1 cli_2.0.1        
[13] withr_2.1.2       ellipsis_0.3.0   
[15] remotes_2.1.0     assertthat_0.2.1 
[17] digest_0.6.23     rprojroot_1.3-2  
[19] crayon_1.3.4      processx_3.4.1   
[21] callr_3.4.0       fs_1.3.1         
[23] ps_1.3.0          curl_4.3         
[25] testthat_2.3.1    memoise_1.1.0    
[27] glue_1.3.1        compiler_3.6.2   
[29] desc_1.2.0        backports_1.1.5  
[31] prettyunits_1.1.0

sorry to hear about your trouble installing the packages, but glad to hear you figured out a workaround.

I understand @Max had assumed {tune} would be on CRAN by the time the workshop starts, but that didn't happen, which is why install.packages("tune") errored out.

I'm not sure what caused the error with remotes::install_github("tidymodels/tune") that you mentioned; that really should have worked and I couldn't reproduce the message though my system is similar.

Let me know if you still run into any problems.

FYI, if you have questions or issues you'd like to see addressed, you can also:

  • raise them right here; the TAs get a notification and we'll let @max and @davis know.
  • raise them on the Gitter chat for the workshop, which Grace has kindly set up.

Hi Max,

On the subject of tuning parameters, here is a paper that sways toward the randomized grid. If you find it useful, perhaps add this reference to your material?

I find parameter tuning more of an art than science. Iterations and coarse and fine grain searches are different paths forward at different times, but am not aware of a good algo or heuristic to go about it. I think it's related to the data you're working with, along with the algo that you use. For example, xgboost tree_method defaults to auto but for a data set I worked on, exact trained faster than auto, which seemed counter-intuitive. Please share any insights you have. Thank you.

-Denny

Bergstra - random search for hyper-parameter optimization

1 Like