Rstudio IDE Dev Ops (Maybe Docker)


#1

Hello! Love the idea of this community site and want to thank Hadley, Joe, & Garett + others involved for creating / maintaining this platform!

About me:
Software engineer on Dev Ops / Scala and now R with Python experience

Would love to gather people using / maintaining the Rstudio IDE internally (with our without Docker) to help each other and brainstorm feature suggestions / requests!

Having a centralized Rstudio instance has help tremendously with package / resource / internal data maintenance.

The one pain point we have had is related to session maintenance as from time to time I’ve seen sessions left hanging & consuming resources?


#2

And please ping me if this post / topic is better suited for other forums and I’ll migrate my question there!!!


#3

Hi devon, welcome to the community! We’re hoping to open a forum in the future specifically for sysadmins/devops, but for now this is indeed the best place to discuss RStudio Server admin topics.

Are you using RStudio Server Pro? We’ve added a new feature in 1.1 which will allow you to tell the server to forcibly terminate/clean up sessions that haven’t been used for a configurable length of time. More here:

http://docs.rstudio.com/ide/server-pro/1.1.365/r-sessions.html#workspace-management


#4

Thanks for the fast response and for the information! I’ll keep up to date on rstudio blog. Is there a way to have me included or updated when that sysadmin/devops forum is released?? Again, appreciate the help here!

Unfortunately we are on the community edition. I’ll read through the link you sent now! Any tips on handling this in the community addition?


#5

I would think that you could use some kind of cron script to terminate sessions that go past a certain age. It wouldn’t have the same level of visibility to whether or not someone had actually used the session in X hours/days, but it could work in some usage scenarios. The killall command should do it:

I expect someone needing anything more complicated than that is going to be pushed towards the Pro edition, where they added the feature specifically for people who pay. :wink:


#6

hahahaha. You have a fair point there nick :rofl:


#7

The main challenge with the cron script is going to be figuring out how to terminate sessions that have been unused for a certain length of time–for instance, you wouldn’t want to terminate a session that someone is currently using!

It’s not documented or supported, but there is a last used time for each session in ~/.rstudio/sessions/active/session-XXX/last-used that might help if you want to try to roll your own session cleanup.


#8

I commonly use a one liner like find /home/*/.rstudio/sessions/*.Rdata -mtime +10 -exec rm -rf {} \; to kill user sessions over 10 days old.


#9

Yes, having a decent sized central VM that everyone can use as their development area has really increased the productivity and consistency of using R at my company. I let users install packages into their own personal library, but also have a standardized and approved system library that makes deploying R code to production much simpler. I also find using package dependency tools like packrat + Docker makes deployment way more maintainable and less prone to various errors like package upgrades that deprecate functions.

I’ve made a few shell scripts to run weekly to clear up space on the shared VM so not really having problems with hanging sessions anymore, but I do see the VM usage be very inconsistent with people using all the resources causing problems with other users. I would be really awesome to have a more auto-scaling platform.

Jupyterhub is built on Docker and allows multiple user accounts on top of linux user accounts of the main VM, thus isolating resources and preventing any one user from taking over the entire machine. I’d like to eventually get my setup to something similar – one central VM with each individual having their own RStudio instance as an isolated container tied to their linux user. Or maybe deploying on top of kubernetes and allowing it to decide when to add/remove pods as necessary.