How wide/deep should an organisation's R guidance go?


#1

I’m looking to develop some guidance around how people in my organisation should use R, including (but not necessarily limited to):

  • In-house style guide (similar to Google’s and Hadley’s i.e. basically the same)

  • Package choices (riffing off of RStartHere)

  • Version control (if anyone knows of good TFS with R docs, like Jenny Bryan’s happygitwithr, now is the time…)

  • Project structure (something like R OpenSci’s rrrpkg)

You get the idea…

I thought this would be a good time to see if people had ‘stories from the frontline’ (I will avoid war imagery when distributing any coding guidelines) they would like to share, esp. the successes/challenges of trying to establish this in your teams. I’ve come across Airbnb’s recent R preprint, but it would be cool to hear what experiences other’s have had. How did you do it and what did you learn?


#2

Ewen,

You’re right there is no single place where you can go to get all the best practices on R. The good news is that there is a lot of great content out there today (I like your list). As for your question: how wide/deep should you go? I would break best practices into two initiatives for your organization:

  • Educating R users on best practices. This includes things like getting full advantage out of the IDE, using the tidyverse, and building Shiny apps. To name just a few.
  • Administering R tools. This includes building the right systems (RStudio Server and RStudio Connect) as well as integrating them into other systems (Git, databases, Spark, etc.).

When I worked in client services I got to help other companies onboard analytic platforms. They always had these two questions: How do we build the best tools, and how do we train our analysts to use them?


#3

Hi Ewen,

Here’s my experience from the trenches. I work as a data scientist at a big Canadian bank and have been pushing R for over a year. I started using R as a data analyst, looking for a way out of Excel hell. One of the larger pieces of work our team was doing was an annual process that required a lots of iterations, a lot of (ever-updating) data, regular outputs often in graphical format, and a fast turnaround. If only there were a tool that perfectly aligned to all of those needs.

Well, that tool was forever Excel. When I joined the team I pushed hard to use R. It was initially met with a lot of resistance. The VP called it a ‘black box’, never mind that I could show him the exact code that produced the exact output. I started posting on our internal social networking site about the dangers in using Excel, and about how other tools, ahem, might be better suited to certain tasks. It gained a lot of traction internally. I kept posting about R, until I became known as the R guy.

My VP was still resistant, so I had a strategy for him. He wanted Excel outputs so he could fiddle with numbers, so I used R to output to Excel, while also giving him the benefits of the real things he wanted: charts, fast turnaround, accuracy, better insights. Eventually, I noticed he wasn’t even looking at the Excel files I’d send him, but only at the PDFs and slides I created in Markdown.

That small success led to people approaching me, asking how they too could use R. I started with small training sessions with my team, and later expanded that to 30+ people attending regular bootcamps. I’d teach tidyverse first, and skip base R as much as possible, as recommended by David Robinson and found that worked well.

Next I started following some of the tips for internal R packages, following examples from Airbnb, to create an internal package that does things like sets up our proxies and standardizes visual identity. Around the same time I published an in-house style guide. I started creating more posts showing the power of R using relatively public internal data, such as how we can identify communities and cliques by looking at who follows who on our internal social network. I put up pretty graphs and people went wow.

I’ve started traveling to other offices, on a different continent, to continue to spread the gospel of R. I do presentations to our students, as well as at internal talks. I recently did one on visualization using R and Shiny and it was really well received. My favourite line from the talk was talking about how my boss loves Python and I love being right. It’s a slow process, but there is progress.

Personally, I use ProjectTemplate for all the work I do, but I haven’t forced it (or anything) on anyone. I don’t have that kind of clout, and resistance is so high here I have found that the carrot seems to work better than the stick.

One anecdote, I had a colleague who loved analysis and so I thought he would love R. I showed him all the things I could and he said “I could do that in Excel, and in half the time.” I tried to push R, but he resisted. I left that team 5 months ago. Looks like he’s missed me, because recently he’s come to a boot camp, and was excited to show me all the neat tricks he’s learned in R. He just recently moved to a new team and was excited to show them the things he’s learned.

That’s how I did it and I’m by no means done. I’m still trying to build adoption across the organization. It’s easy to build it on my team, I’m a team of one. Our larger team of data scientists are also easy to convince, they are all using either R or Python. It’s the rest of the org that is tied to Excel that is slow to convert. What I’ve learned is persistence will pay off, and that little victories are sometimes all you can hope for. There’s no replacing the power of teaching people in a boot-camp style session. Half the people tend to fade out, but the other half seem to be incredibly engaged. Find ways to show the good work you can do. Make life easier. Reduce the friction for getting started as much as possible. There’s no point teaching Shiny before they’ve seen how easy it is to clean and automate rudimentary tasks.

I don’t know if I’ve answered your questions, but there’s my experience for what it’s worth.


#4

I have worked at two big companies where I used R for bioinformatics.

At the first, there were zero guidelines or guidance for R use. No problems really, but the team could probably have been more effective with better collaboration if we had some guidance and tools to help us.

Where I work now, R is the official tool/language/platform for the bioinformatics team. I agree with Nathan that the administration/infrastructure support is key! Collaborating with co-workers is great when everyone shares all their code on GitHub. We also use tools that allow us to run RStudio in common environments so analyses are repeatable. This can help everybody use the same versions and not run into conflicts because I updated a package and someone else didn’t…

Just my thoughts!


#5

Great thread and great responses so far. I’ll try to be concise with ideas, but feel free to follow up if I’m unclear.

  • Use arrow assignment for objects because tidyverse lets you easily pipe data so you want be explicit about direction (I.e. x <- my_df %>% sample(5) vs. my_df %>% sample(5) -> x)
  • do code reviews and praise positives instead of only criticism
  • advocate linters ( I like lintr )
  • remember everything is a vector
  • use vector operations as much as possible
  • care about reproducibility and encourage trying to repeat other people’s analysis
  • don’t save .Rdata environments by default
  • write tests for code when appropriate
  • handle errors explicitly
  • avoid using the overload assignment unless you really need it <<-
  • encourage sharing code through snippets, gists, R packages, shared folders, etc.
  • try doing pair programming with one junior and one senior ( in terms of R knowledge) and let both get a chance to drive
  • support lunch and learn sessions or lightning talks or meetups, etc. to give people an opportunity to show off what they learned

There’s a ton more I could go into regarding administrative stuff, but I think the interest and community aspects are more important.


#6

I work in a group of analysts whose side-line is being R “evangelists” inside a financial services company. Some things we done:

  • Spin-up and configure a large cloud machine with 16 cores, 100 GB of RAM, several terabytes of disk, RStudio server, Shiny server, and plenty of pre-loaded packages. Then invite other departments to use that R environment, knowing we’ve eliminated the major start-up hurdles.
  • Be available for internal consulting, willing to help as we are able.
  • Hold regular “Friday Tech Talk” sessions, where we introduce and explore the topics around R.
  • Produce apps and reports that knock people’s socks off. Tell them we used R.
  • Create helpful, local packages. Create a local package repository where folks can easily download them.
  • Give away books on getting started with R. (Yes, we do.)
  • Send group e-mails when we discover cool packages and articles that folks could use.
  • Encourage people to attend R-themed conferences. (Yes, they do.)
  • Be certain that senior executives know about R and what we’re accomplishing with it, even if they don’t completely understand what it is.
  • Show other analysts how to enhance their career path by learning and using R.

Finally, we are not trying to replace Excel. Rather, we want to co-exist with Excel. Our slogan is, “The right tool for the job”; and sometimes Excel is the right tool.


#7

I think one of the important ideas to keep in mind is that your guidance should support and enable people’s work, rather than feel like an enormous set of rules that people have to follow.

One way to accomplish this is to use the guidance and tools that you’re providing to solve obvious problems everyone faces. In my organization, that problem has been difficulties accessing data from the 20+ data collection applications we currently maintain. Our approach, then, has been to use the style guide also to provide code snippets and other tips about how to access data sources and join them together in the most quick, effective, and painless way. That’s the kind of advice you can’t find by Googling. If you focus on driving value for the analysts, they’ll come along with you more readily.