Resources for new R admins

highlight

#1

I am about to embark on the R admin journey. I'll be managing a small setup with RStudio server pro and RSConnect installed on the same server with no dev/test server. I have looked at the admin guides and have some familiarity with Linux but no prior experience as a Linux admin. I will be working closely with IT for security and authentication but I think I will be managing the software beyond that.

What resources or advice is available for new R admins beyond the admin guides and this forum? I'm interested in getting a sense of what all this role entails for others and what the most difficult parts of the job are.

I really enjoy Nathan's talk from the rstudio conf 2018 (https://www.rstudio.com/resources/videos/the-r-admin-is-rad-a-guide-to-professional-r-tooling-and-integration/).

Looking forward to diving in!


#2

Exciting stuff! Looking forward to hearing how it goes! Some resources that may be helpful:

A bit of an overview:

On learning Linux:
https://seankross.com/the-unix-workbench/index.html

A relatively new Github org focused on this problem and these types of resources:
https://github.com/sol-eng/

Nathan's resources from the talk you mention:

More on how we think about R installations in a server environment:

Ultimately, getting familiar with the UNIX terminal will be a huge asset (things like cd, ls, grep, system daemons, config files, vim, and the like). You can also practice building R from source, which is a good skill to have on hand. I think @nathan is doing a webinar on the topic this week, and there is a budding website at production.rstudio.com .


#3

What I found most helpful for testing and experimentation was downloading and using docker. If you're on Windows, grab docker for windows. If mac, then just install docker. Once you have docker, you can grab the image of your choice (ubuntu, centos) that matches what you're using in production.

This helped me enormously. You can try installing R, installing or upgrading packages, and installing OS dependencies for those packages in your docker container. If you screw it up - who cares! Throw the container away and start a new one.

If you're new to docker as well, I've got a cheat sheet that I put together that should help you get started.


#4

Thanks! I am new to docker. What is the difference between docker and a preconfigured virtual machine? A cheat sheet would be helpful.


#5

Here's some resources to get started:

https://www.docker.com/what-container


What is important to know is that it's very easy to pull pre-built container images from https://hub.docker.com/explore/

You want an ubuntu container? From your command line (once docker is installed) just run "docker run -it ubuntu bash", and docker will pull an ubuntu image from hub.docker.com, and launch you right into the container as root.

You want a centos container? Just run "docker run -it centos bash".

Let me see what I can do about getting you the cheatsheet.


#6

docker has a lot of semantics that make it much more lightweight than a virtual machine. This makes it very nice for the type of throw-away exploration that @snkfischer is talking about (or microservices, etc.). It also does not have its own kernel (it uses the host machine's kernel), which makes images much less resource-intensive, starts up quicker, etc.

On a typical workstation, a few VMs will cripple the host machine's resources because you are running several operating systems concurrently (and lots of overhead with it). With Docker, it's way easier to run a whole bunch of containers concurrently without resource issues, and it pretends to have a separate operating system, so you still have some really helpful isolation. There are some nuances to this, because this does mean it is more tightly coupled to your OS than a VM will be, but that's what the community here, the Docker community, and documentation are for! :slight_smile:


Choose R version for one specific project
#7

Thanks @cole. That is a helpful explanation. I tried installing Docker on my macbook pro last night and turns out my CPU is not supported. My computer is old but still works well for most applications since I upgraded the hard drive to an SSD. However it seems like Docker is not going to work. I might try using another computer though.


#8

Ah I'm sorry to hear that! You could always try installing an older version of Docker, perhaps? I know very little about CPU compatibility :slight_smile: This is a little funky (and definitely more complex), but you could also try installing docker in a VM on your computer :open_mouth: I'm not sure if that abstracts the CPU enough to trick docker into installing or if you'll just end up at the same road-block.

The other option is something like one of the cloud providers (AWS, GCP, etc.). Most offer a free tier for a year or a free credit or something that you can use to spawn a small linux machine in the cloud. You can install docker on that little machine and then play with it through an SSH session. There are benefits of being forced to use the terminal and learning some more complex networking stuff! Of course, it is a bit more work to do so.


#9

This thread has turned into "How do I run R with Docker?". Kelly O'Briant on the RStudio team recently gave this presentation at UseR! 2018:


#10

There's video of this talk as well:


#11

Hey @ablack3! Fellow newbie here. I'm in almost the exact same boat as you - managing small setup with RStudio Server Pro and no prior Linux admin experience. Eager to hear how your journey has been over the past couple of months. I'm especially interested in discussing authentication, automation and package management. I'd be happy to share what I've learn and found most helpful!


#12

Welcome, @dolan ! I'd recommend checking out other discussions in the #r-admin and #r-admin:package-manager topics, as they sound right up your alley! Lots of us are interested in helping - feel free to start a topic with any questions you might have, or share any of your experiences, as I'm sure they will benefit others!


#13

This thread also seems like a good place to highlight that registration for #rstudio-conf has been open for a while, and there is a 2-day workshop focused on training for the R admin (specifically those who are administering RStudio Professional products). More information below:


#14

Hi @dolan,

Well it has been a challenge like anything new tends to be. It took many months of work to get the approvals to purchase the software and get R "through the front door". Once we purchased the software it has taken almost three months to get a Linux server set up and the RStudio software installed. There is still more to do before the server can be rolled out to other users. I am using it for my work and it is sweet. In our organization IT is siloed in it's own building far removed from where I work. Progress seemed to go fastest when I was able to work in the IT building directly with the Linux admin. In my experience everyone in IT is very nice and helpful, but it really helps if you have face to face contact and can spend some time working together on the setup.

Authentication was by far the biggest technological hurtle and a big reason why the setup has taken so long. This is an area I know almost nothing about, and I would prefer not to get into the weeds of it. We used PAM with Active Directory for both RStudio Server Pro and RSConnect. RStudio was very helpful providing some support on the phone to help us work through issues. Everything works now but there are still some lingering issues to resolve that I don't fully understand. I do think it will be worth going through the pain of integrating the RStudio software with our existing systems. Using PAM allows database authentication from within the RStudio server IDE to use be effortless which is great.

The actual process of setting up the server from my point of view was just installing a bunch of dependencies on the Linux server, building a couple versions of R from source, and then installing the RStudio software. All of that went pretty smoothly. I had practiced building R form source a few times using Docker on my laptop as others have mentioned on this thread.
The most common hiccup seemed to be getting some weird error, googling the error, and installing some Linux package that fixed the error. I actually do not have rights to install Linux packages. The Linux admin installed any packages I needed for me and gave me rights to run specific commands as root using sudo. Also I should mention that a couple times things were not working one day and then started working the next without us doing anything. Don't ask my why.

I'm still trying to figure out package management. I had been thinking I would just install all the packages people want to use in the global shared library on the server but @cole strongly recommended against this approach, and I plan to heed his advice. The best approach for controlling packages on the system seems to be to set up an internal CRAN mirror and there are a few options I am exploring.

Takaways from my experience so far:

  • Expect the setup to take a while and to run into issues (hopefully you will be pleasantly surprised)
  • Befriend the IT folks/Linux admins and try to get some face-to-face time to work on the server together
  • It can be tough continuing to do your normal day job while also trying to get the R infrastructure working and a supportive manager is very helpful

All that said I am really looking forward to putting this software to use and doing some cool stuff with it. I am an R & RStudio superfan and believe that all the work to set up good R infrastructure will pay off.


#15

Also, this course on Udemy was very helpful.