Pre-teaching R. What's your best argument for switching to R from Excel?


#1

Please move this topic to a different category if it doesn’t belong here


Being the only one practicing R at my company, it sometimes feels lonely. And there are many other things.
Like this:

Over the past year, I’ve been trying to convince others to switch to R. That is, people who don’t directly report to me. But every time, my most powerful arguments, including reproducibility, speed, and other, hit the brick wall of “learning R, while Excel is learned and known”.

I work with people of different backgrounds, age, scope of responsibilities, seniority in the company. All would benefit from R. None are ready to learn.


Do you have any arguments that trump the “learning curve” counter argument?
Or is top-down approach the only way? (It could be an option, and I started looking in this direction, but I also want to spark desire and get people to learn R voluntarily, before we adopt some specific workflow that has R in it)


#2

That’s a tough one. I’ve found that the best way to get people interested in R is to help them with little things that have a big impact, and are easier to do in R vs. Excel. The one that I find myself constantly coming back to time and again is the use of inner_join(). Perhaps finding some common pain points within your organization and developing quick solutions with R would be helpful!

Now in my experience, this doesn’t result in everyone switching to R, but it does open up the door for further conversations on how R can benefit someone’s work.


#3

Hi. I’ve been advocating R at my company for the last couple of years (we pretty much had two extremes: analysts using Excel, possible with some VBA, or developers using C# and F#).

Getting buy-in was a bit slow at first but I’d agree with @jessemaegan - I found that ggplot, htmlwidgets and shiny have been a tremendous help. The ability to create high quality visualisations and / or interactive access to those visualisations is extremely powerful.

The acquisition of Revolution by Microsoft has also helped. As an organisation, our production environments have all been Microsoft based and so the increase in R-related offerings from Microsoft has given comfort to people more distant from day-to-day use (not that I think that should be necessary but have to recognise that it has made conversations easier).

I’d also add that spending some time documenting / setting out a workflow (even getting your IT team to put together a scripted install of R) to help people get started with R in as painless a way as possible is also important. Nothing puts people off more than having errors that they don’t understand (e.g. Error in library(xyz) : there is no package called ‘xyz’). Anything you can do to make it easy to get going will help. If I could go back and do things differently, I’d ask our IT team to help set up RStudio Server (the rocker project is great to help with this: https://github.com/rocker-org/rocker-versioned).


#4

You could always show them this: http://www.eusprig.org/index.htm.

More practically, I switched to gain access to a much larger toolkit. If I wanted to learn about and apply things like mixed and random effects models, decision trees, etc. I couldn’t use Excel anymore. Tidy data, shiny, R markdown and all the other stuff has just been bonus since I switched.


#5

I feel your pain, having experienced these frustrations myself!

I would totally agree with @jessemaegan when she says:

I would add to this that:

  • an easy win could be by focusing on the limitations of Excel and how these can be overcome with R. For example, if you have transactional records, you will soon hit the Excel limit of 1,048,576 rows; or trying to filter or use a pivot table with large volumes of data is much more painful than using dplyr commands such as filter or group_by. Show them how R can make their tasks and jobs easier.

  • senior executives often want data, visualisations etc., to exact specifications and in R, specifically with ggplot2, you can really produce unique and appealing visualisations that you couldn’t produce in Excel. Get the wow factor from your outputs and get colleagues and senior executives intrigued into how you produced your analysis and visuals.

I would advocate starting small and building up bit by bit. For example, when I started introducing R to the business I created visuals and plotted them in ggplot and simply copied and pasted them into Word as I believed that going straight to Markdown would have been too big of a step in one go. Don’t scare them - take your colleagues and the business on an R journey with you.


#6

I think it depends on the team, and the toolset they have grown used to using. Where I was previously working, our entire team used a piece of software called Alteryx, which has the same functionality as the tidyverse, but it comes with a high price($) to use. Our mgmt bought into it because it’s ‘visual coding’ through a workflow which is very easy to get up and running with…you still need to know the basic data transformation concepts, but it’s much easier to see what exactly is going on. I will say that vanilla R does present a relatively high bar to get up and running with, which can make it hard for others to see the value in it over proprietary software, at least in large corporations.

Personally, I’m a huge fan of visual coding, because it helps me think clearly about the problem I want to solve, but learning the concepts around tidyr itself is hard enough to grasp for a newcomer coming from Excel, that if you got errors along the way(such as encoding, size of data, etc…) it’s just overwhelming to piece together to see the value.

With that being said, not everyone has enough $$ to afford proprietary software, hence relying on open source tools to get your work done. If you want someone to use software that is valuable for their job, don’t simply show and tell software, but rather teach them the core concepts around data wrangling, and show how the software facilitates tasks around data wrangling in an simple fashion to make their life easier.


#7

The people in admin/analyst roles that are coming onto R courses I run have a strong interest in being able to repeat reports by updating a data file in a folder, then knitting a Rmd template that produces this weeks/months report (and maybe rearranges the data and saves as an excel file to go with that).

So maybe some examples around repetitive report generation.


#8

I had two arguments:

  • the potential for vizualization: the charts from ggplot and choropleths from tmap are slick.
    More to the point, they give you results that are unachievable in Excel, no matter how hard you try.

  • its scipted nature: it might be pain to create a report / a model at first, but it is super easy to rerun it with slightly altered data.
    Or to add a little for cycle to prepare a several variants of it with somewhat different parameters. This can be surprisingly hard in Excel, especially in a complicated spreadsheet that did not plan for it in advance.

  • R scripts can be scheduled (e.g. using cronR addin to RStudio Server). So your user, or rather his manager, can come to the office and have his reporting pack ready to read with his morning coffee.

  • R scripts are scripts, not worksheets, and as such very transparent and resistant to overwriting data (or worse formulae) during the process of report generation. In Excel environment this can be a major problem.

Excel is ubiquitous, but it has its pain points - build on them!


#9

I think a few more noteworthy points that go nicely with the points others have made:

  • R integration with version control (github, etc.) for change monitoring and collaboration (Excel’s collaboration tools have always felt clunky). Granted this will seem less like a feature to an Excel user-base, but the ability to monitor and review changes at the character level has value. (No longer do I have to worry that someone lacking attention to detail is going to inadvertently clobber my work!)
  • Excel’s tedium when doing “simple” tidy data operations, transposing, etc. Typically a pivot table is the classic example, but reusing these values “in a pipeline” is painful and tedious.

For the second point, having a good command over the tidyr functions like gather, spread, complete, etc. and having confidence when explaining them can go a long way to illustrating the simple brilliance / power of the R language and its many packages.


#10

Without knowing your situation it’s difficult to give advice, but I’ll give you the benefit of my experience in a very large company with a sprinkling of R being used in different areas but not enough for it to gain a fundamental foothold. I am almost the only one in my area (one other concentrates on databases and the like) and have thought about how to expand the user base.

  • Demonstrate the tangible benefits to managers (all the good stuff mentioned above), but don’t expect them to learn R. They’ve got underlings to do the dirty work.

  • Identify those most adept at Excel and/or SQL to see if any might be interested in the challenge to develop their skills - approach them individually if you can so there is no herd mentality.

  • Avoid the sly ones who will willingly hear what R can do, but use this info to palm off their own work onto the R users (in this case just you) or generate more work promoting it as their initiative. This may seem cynical but these types either climb the career ladder or work out how to make their lives as easy as possible. They may well not exist in your work-place.


#11

I’m having similar challenges in the research group where I am currently doing my PhD. My approach, which has been somewhat succesful, has been to start developing an R package for the group, which contains common functions that everyone can benefit from.

We have a handful of experimental techniques that everyone uses from time to time, so I have created functions for easy loading, analysis and plotting such data. It is mainly from code that I have written for myself anyways, so putting it into a package has not been a ton of extra work. This lowers the barrier for some people, so I have gotten some people on board. However, it doesn’t help with the people who are scared of starting to do script-based analysis.

So if you can think of common tasks in your organisation, which could be streamlined through an R package, that might be a way to go.


#12

I think this is an excellent blog post about what it feels like to switch and what the payoffs are:


#13

This is me! At least in part; one of two really R-fluent members of a team that is supposed to be “learning R” but finds it difficult to switch over from Excel/SPSS. I’ve given up on trying to convince anyone that they should do it, mostly because I have to recognize that it would be a really big and difficult transition since they (mostly) aren’t coders and rely on point-and-click interfaces, and they just aren’t internally or externally motivated.

Instead I try to keep an ear to the ground for problems that they’re having with whatever they’re working in and, if I can find a solution in R I’ll put one together. I’ve written some tutorials that are tailored to the work we do. People will come to me if they need help un/stacking data (i.e. using spread or gather).

A huge thing for us lately has been Shiny applets–we do some applets in Excel but you can’t do advanced analysis or simulations there. Adding this to our arsenal has been really exciting.


#14

If your colleagues are using Excel for data munging, R would be a better alternative. If they are doing stats they are likely already using R.

But I’m going to play devil’s advocate here… Unfortunately in 9/10 times your colleagues are right - Excel is a powerful, easy to use tool that is akin to pen and paper. Its no conspiracy as to why Excel has stuck around for so long and will be here with us for the next 20 years (probably longer!). Its only major downside is reproducibility (but a good Excel user knows how to lay out their work so it flows well and R doesnt guarantee reproducibility either)

While switching to R will ease some part of their work, the benefit needs to be immediate and tangible for them to allocate time out of their day to learn R. Yes we know the benefits of R, but as a long-time Excel user I tend to roll my eyes at the notion Excel users should “just switch to R” in the general. The most often-cited example why people should switch, namely ggplot, is a great example why people SHOULDNT switch from Excel: Ask an Excel user who hasnt used R to make a mixed bar and line chart on a secondary axis in ggplot and watch them never touch R again. Its an absolute pain to understand ggplot if all you want to do is plot some lines.

We have our church, they have theirs. We just have to suck it up and deal with it :slight_smile:


#15

Good points: there needs to be a compelling reason to convert.

For me (I was the Excel guy) the reasons were clear: handling over a million rows without a database and speed (join vs. vlookup).

I still think using ggplot as a selling point is a really strong one, but as for telling people they shouldn’t use a secondary axis with different scales, I’ve given up.


#16

I heard Hadley gets an Inspector-Dreyfus-twitch at the mere mention of secondary axes - but dammit they are just so useful! Eg Something like seeing frequencies within factor levels alongside the corresponding exposure / counts is really useful when doing any form of modelling.


#17

This was one of the big factors I used in converting the business from Excel to R.


#18

Generalizing this point, another longer term advantage of R is that it makes you think about data visualization principles, data types etc., much, much more than Excel.

For a user who is willing, using R over Excel can really advance their understanding.

And never give up on telling people that they shouldn’t use a secondary axis with different scales! :+1:


#19

Absolutely right about not only thinking about proper data visualisation but also more creative methods.

I’ve given up on the dual scales issue because I’m already considered the crazy preacher guy. If only the unbelievers could see the true path to happiness …


#20

Thanks!

On the other hand I still consider the two scale chart issue contentious.

I do not mean to hijack the thread, but despite knowing the official ggplot2 explanation I still have a deep (personal) feeling that there are legitimate uses for two scale chart presentation, such as this famous chart of rig counts and oil industry employment over time, stolen from Medium.com; I feel that the two scales nicely illustrate the concurrence of the two phenomena (or not…).