I'm using Microsoft Azure and want to be able to use RStudio the whole time in a project I'm working on. Below are the steps I need to do.
- Read in data from data lake
- Data preparation, cleaning etc.
- Create forecasts for thousands of products using machine learning
- Use the results from (3) in a Shiny dashboard.
I need to create a pipeline so when the data is updated (every week) I would schedule an R script to clean the data and create forecasts. Then push everything to Shiny.
I have one data engineer at my team and his idea is to use Azure Databricks. On Databricks we would have to use the notebook interface for the pipeline to work properly (at least that's how I understand it after speaking to the data engineer).
Howeve, I really don't like the notebooks and I want to stay withing RStudio the whole time. I know I can use RStudio on databricks but I'm not sure how to write a script in RStudio on Azure Databricks so I can use this script in the pipleline.
Every time I terminate the cluster on Databricks I lose everything I created in RStudio.
So, what is the best thing I can do if I want to use RStudio on Azure if I want to automate the steps above?