Equivalent of JupyterHub with Rstudio (Cloud?)

Berkeley has what looks like a really great setup for their data sciences courses. See Zero to Data 8. Their approach relies on JupyterHub but I'd prefer something based on R and Rstudio's IDE that also allows us to use some python (e.g., using reticulate)

I just setup some docker containers with Rstudio and Shiny at GitHub - radiant-rstats/docker: Use R, Rstudio, Shiny, Radiant, Python, and Jupyter in a Docker container and am thinking about how to best make these types of environments accessible to our students. @mine has posted some great resources for teaching using a docker based toolchain (see links below) but it looks like this requires a skill set we don't currently have in-house.

I'm interested in hearing your views on zero-to-data-8 and if something similar could be done with R and Rstudio.

I think, RStudio Cloud can be a solution here. Did you try to use it? It is in alpha, so feedback from you and your students is valuable for the team to make it right.

As for the zero-to-data-8 you've mentioned, I'm not sure it is as easy as you think. First thing they suggest is to set up Kubernetes cluster. I'm not a DevOps guy, but work with them and from my experience, Kubernetes is anything but easy. So if you are saying that you don't have people who can work with Docker reliably, I'm not sure you'll have people who can do the same with Kubernetes.

@vnijs I agree that RStudio Cloud is the least technical overhead solution here. I was very lucky to have the university IT interested in helping me solve my server access/authentication issues when we set up the system linked to in your post. This past semester I used RStudio Cloud, and I'm currently writing up setting up a class on RStudio Cloud. I'll share a link here in a little bit. But I've also recently given a talk on it at eCOTS 2018. The slides and video cover some of the details: http://bit.ly/frictionless-onboard, https://www.causeweb.org/cause/ecots/ecots18/tech-talk/4. I'll be back with more...

I also agree with @mishabalyasin that the zero-to-data-8 approach is that easy. I think, especially in an educational setting, it would similarly require local IT help to get things going.

2 Likes

Thanks @mishabalyasin and @mine. I agree that the zero-to-data-8 approach is not straightforward either. However, perhaps this could be managed at the university level and then instructors could supply docker containers to be made available to students.

That said, Rstudio cloud looks very promising. Thanks for sharing the slides and video @mine! The ability to provide a base project template with the package versions students should use is great although I noticed that installing packages from source can causes some (e.g., readr or xgboost).

A key concern for me with moving to cloud is how to access files on the server from a shiny app and extract file paths. I use radiant extensively with my students and being able to access files and file paths is important. It works fine when everything is installed on laptops but is not as smooth (yet) on a server (see e.g., this issue).

Sharing of projects through cloud and/or git looks very convenient on Rstudio Cloud as well. @mine Do you by any change also have resources on using GitHub classroom? I have been using GitLab with an Rstudio addin I developed (https://vnijs.github.io/gitgadget) where the instructor can create and fork repos for students and then create merge requests for them after the due-date. The main issue with previous git based class solutions I have seen is that once students submit their work, they could see each others submissions as well. Does GitHub class room address that issue?

For classes where instructors use python we would also need access to python and Jupyter Lab which I assume is not possible with Rstudio Cloud. I'd prefer to use Rstudio for this as well but we can't make instructors use it of course :slight_smile:

1 Like

@vnijs,
I would love to know a bit more about what "extracting the file paths" means. Are you saying that you want the Shiny applications that are part of the project to write to a location that the IDE can then use? If so, is there any issue with having a subdirectory that has the application, and a specified data directory that would be accessible from the IDE?

My apologies if I am not understanding the use case properly.. We haven't thought about hosting JupyterLab on rstudio.cloud, but if it turns out that enough people want/need it, we could certainly look at what it would take.

1 Like

The benefit of Rstudio Cloud would be to have all materials and data on the server. Although with some of our students we would use mostly code, with others we'd also want to use shiny apps and gadgets. Of course every assignment and case is different so to make this work well the shiny apps would need the flexibility to load data from the server. You could constrain the app to only be able to access data in one directory but then they wouldn't be very broadly useful.

If you run an app from Rstudio Server you can access the file browser from rstudioapi to load files on the server. Shiny, in contrast, uses the browser's file browser to upload and save files from the client's computer. See https://vnijs.shinyapps.io/radiant/ for an example.

There are still some issues, though, with using rstudioapi::selectFile on Rstudio Server or Rstudio cloud however (see https://github.com/rstudio/rstudioapi/issues/91#issuecomment-39484). shinyFiles allows access to the server file system but the package is no longer actively maintained and doesn't work with renderUI. This is the main issue holding use back from using Rstudio Server or Rstudio Cloud more extensively.

In sum, I think having a good option to access files on the server from shiny apps, e.g., through rstudioapi, would be a huge benefit and would make Rstudio Cloud a much more attractive alternative to running everything from students laptops.

Is the data that is being loaded, be read only? or are you looking for the students to be able to write to the "server"?

If it is read only, it may make sense to show you some of our ideas around data sets.

Sharing bigger data sets as read-only through some sort of shared-folder would certainly be convenient.

Ideally, though, the user would be able to load and save any selected data set in the directories accessible (and writable) to the user.