I am looking for best practices for working with RStudio Open Source installed on a cluster on (Azure) Databricks. Locally I currently organize my R code in projects, which is best practice for a local machine. However, is this also the suggested way of working with R code on Databricks given that you want to use RStudio Open Source as the IDE instead of the Databricks notebook IDE?
Questions / Discussion points:
- Should I organize my R code in projects?
- Where should I save my R code?
- Other tips on how to work with RStudio Open Source on Databricks? For example, what are the pros and cons of installing packages via the Databricks UI versus
install.packages()
?
This introductory video on how to use RStudio on Azure Databricks is somewhat useful, but it does not discuss the points that I have listed above.
In general, my impression is that Databricks provides much less practical information and code examples on how to use their platform with R compared to Python and Scala, which I think is a shame for the R community.
Any comments, suggestions, and links to resources are most welcome.