Responding to questions about data security

Hi folks,

I'm a professor at a university and I have a concern about the security of RStudio (e.g., server, desktop, etc.) and also R. I'm exposing my students to data science via RStudio, many of whom then take R/Rstudio into the workplace.

An instance occurred recently where one student was interrogated by their CEO regarding the use of R for data analytics, and I'd like to have someone's perspective on how to respond.

In this instance, the CEO was particularly concerned that because R is an "open source" tool, they cannot verify that (for instance) some Russian hacker created a package that could (unbeknownst to them) be capable of downloading all of an organization's data and shipping it to Moscow (hypothetically). The CEO, in essence, didn't trust the open-source nature of R and RStudio.

As an institution, we were then approached and asked to teach our students tools that were certified in some manner (e.g., I was asked to consider Excel; a colleague recommended STATA). I don't think this is necessary and I suspect that this isn't really an issue, but I was not quite sure how to respond.

Does anyone have any guidance?

Thanks in advance,

Chris

1 Like

I first wrote a long answer, but I think my main point doesn't need a long text :slight_smile:

For people who use that type of arguments as this CEO, I don't think evidence is going to do it for them. They are set in their ways and if they wanted to learn about something new, they have every opportunity. R/Python/RStudio are used in all manner of "certified" situations and it's just like any other tool (open source or otherwise) -- attack surface depends on competence of people who deploy such systems, not necessarily on a tool itself.

Isn't this exactly the wrong way around.
In open source software, you can take the time to read the source code and see what if does. You can rewrite it (make your own version) if you don't like something (like objecting to spinning up an email sender and mailing out your data...)
How can you validate a closed source binary? You cant, what you can do is pay someone for that binary, and then try to hold them legally liable if something bad happens from its contents.

Really what is being traded away is someone to pass the buck to...

1 Like

From Cran's front page :

Note that we generally do not accept submissions of precompiled binaries due to security reasons. All binary distribution listed above are compiled by selected maintainers, who are in charge for all binaries of their platform, respectively

I'm not a legal expert, but maybe there would be someone...

I think the CEO does have a small point that it might make sense in a corporate environment to disallow downloading R packages posted on random github repositories. On the other hand, it would be surprising if there were no open source software already in use. No Linux, Firefox, Apache Web Server, Nginix ... That might be true but I think that would be very unusual in a company that is not very small.

1 Like

On large production environments they use to have their own repositories with "approved" packages to avoid this potential issue.

I sometimes work in environments without access to the internet. That is one sure way that this can't happen. I agree that there is probably no convincing this CEO. What's stopping SAS, SPSS, or Stats from doing the same thing? It's not open source so we can't even see how the sausage is made.

I like FJCC's point about Linux being opensource, and something of an industry standard.

Thanks folks! This has been really helpful, particularly the point that banning open source would be impractical.