Reflections on RStudio transparency

rstudio

#1

First off, I would like to thank the entire RStudio team: you guys are making R better, popular, available, friendly to use, and you are building a wonderful community. Thank you also to everyone involved in the tidyverse. Huge huge fan here. R is really taking a new and great direction thanks to those packages and all of Hadley's amazing books.

Now, there is something that has bugged me a little about RStudio, and I thought that the best thing would be to open up a conversation with the RStudio team on the subject.

I commonly hear (e.g. workshops within my university) or read (e.g. blogs-including RStudio blogs) that RStudio has this or that cool feature while the package and code running under the hood when that feature is used are not made clear.

One example (but I have many others) would be this post: https://support.rstudio.com/hc/en-us/articles/205753617-Code-Diagnostics. The package lintr is not mentioned once.

This has, in my opinion, several issues:

1/ There is an acknowledgement issue. Often, those packages that are run under the hood are created by the very RStudio people and I guess, in those cases, that may not matter. But if anyone outside of the RStudio team created or contributed to these packages, it might.

2/ It is deceiving in making people think that these features were created by RStudio and that RStudio is necessary to use them. This seems to me to be a serious transparency issue. I use R in emacs ESS and after reading the above post (to stay within that example), I truly believed that code diagnostics was a really cool feature that only RStudio users had access to. Until I realized, but quite a while later and totally by chance, that RStudio simply runs lintr under the hood and that lintr runs beautifully in ESS. I now use it all the time and love it (it has helped me get better formatting practices :slight_smile:)

3/ One of the benefits of writing code rather than using a GUI system is to understand what is happening. To get out of the black box system. RStudio is a phenomenal IDE for sure. If I were not an emacs user, I would absolutely use RStudio. But at times, the boundary between IDE and GUI gets passed and I find that unfortunate for the users. I am surrounded by grad students that are very advanced R users, but who have no idea, for instance, that clicking on "knit this file" (or whatever the button says) runs rmarkdown::render() and can totally be achieved outside of RStudio. That RStudio has a beautiful and very convenient integration of the function, but that the function belongs to a package that is independent from RStudio and runs in the console or R GUI, etc. Of course, if they paid attention, they would see the line of code appear on the console while clicking the button. But they don't and some of the feature they use commonly come out of a magic black box for them.

I hope my message does not come across as aggressive as I really do not mean it that way!!! :slightly_smiling_face: RSstudio is amazing. And it makes R friendly. It is amazing out of the box and any person new to R has access to a system that takes months to set in place in ESS. It plays a phenomenal service within the R community. But I would love to see more transparency about what is being run in some of the cool features that are being implemented, more education to users about what code is being called so that the IDE remains an IDE and does not turn into a black box. And maybe keeping in mind that there are other good alternatives (if not as friendly to use) and that there is an R life outside of RStudio. Even for people who are utter fans of the tidyverse, the RStudio team, and Hadley's work. Ideally, I would love to see some merging and collaborations between those parallel systems: RStudio and emacs ESS; rmarkdown and org mode... Because they are both amazing and do more or less the same things.

Thank you sincerely for reading my lengthy message and for all of your amazing work for the development of R and the R community. If I have been unfair or wrong, please let me know! There may be things I ignore or did not see/read/notice and that would make me change my views.


#2

I'll answer as someone who works for RStudio (but that is a relatively recent development in my career). Also as a long-time R user, long-time instructor, and long-time ESS user.

You're right that it is always good to acknowledge what & who does the work under the hood.

But I basically feel like this problems you point out tend to be somewhat self-resolving / self-regulating. I too have felt frustrated that people don't understand the difference between R and RStudio. Or between Git and GitHub. Etc etc.

I generally find that people start to make these distinctions and dig deeper exactly when they are ready to, i.e. when it is actually useful to them. I agree you need to make sure that you're leaving the right attributions and links behind, when you document things. But it's also true that many people use an IDE or other front-end because the tool itself does not offer a user-friendly interface. Many people are very productive without knowing the details and that's OK too. There is nothing inherently superior about interacting with R one way versus another.

So someone like you can play a productive role by pointing out if proper attribution has been left out in RStudio documentation. As for proactively documenting how to use a package like lintr in ESS or to integrate rmarkdown and org mode, it's a bit disingenuous to expect RStudio do to that directly. But it can still be a valuable contribution for someone else to make.


#3

Hi Jenny :slight_smile:

Thank you for your reply! Oh, of course, I wasn't asking RStudio to write anything about ESS or org mode in their posts :slight_smile: That last sentence was just the expression of a personal wish (totally as a dreamy side note) since 2 sets of amazing tools are developed by extremely amazing people and they are more or less running in parallel and doing the same things, but without integrating at all :slight_smile:

Figuring out how to use lintr in ESS is obviously my job, as an ESS user (and it turned out to be totally straightforward). With the lintr example, I was only "complaining" about the fact that the package was never mentioned in the presentation of its new implementation in RStudio and that the post read as if this was an RStudio feature (rather than the implementation of a package feature).

Now that I have understood that the RStudio features are implementations of package functions, I don't "fall" into this mistake anymore. But it made it very hard for me, at first, as a non RStudio user. And, as a new R user, when I attended workshops, asking the workshop organizer what was actually running in R when they were doing this or that in RStudio (so that I could follow along) often proved useless because they simply had no idea (and had never even wondered about it). So an R workshop was really an RStudio workshop. While it didn't have to be, with only a little more transparency. In the age of making everything transparent, open source, cross-platform, etc. all of this feels a little wrong. Kind of outside the philosophy of the very people developing this amazing tool. Probably because it is not obvious, from the inside. But it is, when living outside of the RStudio realm.

It has also annoyed me, at times, to hear/read people use "RStudio" when they should use "R". But I get your point on this. And I guess I have to agree with you. That part is not that big a deal :slight_smile:


#4

With respect to this specific example, I'm with you on the mention of lintr, but I think it's worth pointing out that the description of the article (which is under the header Using the RStudio IDE is: This article outlines the features available in the IDE. So, to say "[i]t is deceiving" seems somewhat harsh, as it implies deliberate or reckless intent (at least to anyone who's gone through an L1 book :woman_judge:).

I think @jennybryan's Happy Git with R is a great example of a write-up that covers a bunch of options in addition to the integration of Git with RStudio: http://happygitwithr.com/git-client.html. That said, I think it's a big ask for every support article to act as a full-blown feature comparison (especially given the rapid proliferation of IDE options and integrations these days— which, like you, I think is a great thing). This, of course, is not the same thing as acknowledging the hard work of others that underlies the content, which should absolutely happen. The challenge is to strike a balance with the when, how, and where to do so (e.g. the tidyverse, of course, cannot exist without base R, something I say every time anyone brings up what I consider to be a false dichotomy— however, in the package release posts I've written, I don't open by reiterating that fact every time).

I've only been with RStudio for a few months, but, like Jenny mention, I'd say the RStudio employees and active user community are 100% with you in wishing that people better understood the difference between R and RStudio. (I'm pretty sure that, just yesterday, I referred to Chester Ismay and Albert Kim's awesome disambiguation chapter in their book 3 or 4 times!)

Folding the couple examples for convenience:

Summary

Loop not working, maybe due to Package ‘grid’ is not available (for R version 3.1.2) but I am using R studio Version 1.0.153?
Issues Updating

I think much of what you're describing is a tension that exists when anyone uses a GUI, and I think the IDE developers work extremely hard to try and build in transparency (e.g. with the object inspector providing the equivalent code to retrieve the same object as you click through).

If/when you think something is lacking in acknowledgement, please speak up. I think the most productive means of doing so is by opening an issue or submitting a PR, when possible, or contacting the author of a post. I think this is certainly a productive thread/discussion, but, given that (in my experience) errors of omission aren't usually born of ill-will, I think corrections would be valuable, especially since it's often hard for "experts" and/or writers to see what's missing, since they understand the knowledge tacitly.

I haven't used ESS, but I didn't start using RStudio until ~2014/2015 (it didn't exist when I first started using R in high school). So, my experience of RStudio has always been about interface (and I likely don't use many of the GUI features I would if I'd been in RStudio from the start). But, my goal in using RStudio, and sharing what I do with it to others is never to obscure the existence/independence of R the language. I can only speak for myself, but I think an effort toward transparency and disambiguation is mutually beneficial. :confused:

EDIT: The below ⇩ has now occurred! :slightly_smiling_face:!
Actional feedback is always ideal— so I'll look into seeing if we can update that support article (I genuinely don't know where that part of the website resides :grimacing:).


#5

To clear up some misconceptions, lintr is not actually used by the real time diagnostics in the RStudio IDE. They are custom C++ code using a purpose built R parser and do not share any code with the lintr package. lintr can be used in RStudio as shown in the lintr readme, but this is an independent of the built in diagnostics.


#6

Oh! Interesting! They seem to produce the same messages. Any difference between running lintr in RStudio and activating the built in diagnostics tool then?


#7

Thank you for taking the time to give such a good answer! And I really appreciate the examples you are citing in which you made the distinction between R and RStudio very clear. Totally love those. I have to run, but will write a little answer later.


#8

Some general differences between RStudio's diagnostics engine and lintr:

  1. lintr comes with far more diagnostic checks related to code style (following Hadley's style guide at http://r-pkgs.had.co.nz/style.html), whereas RStudio only attempts to warn on missing or extraneous whitespace (e.g. between operators)

  2. RStudio's diagnostic engine wants to be as fast as possible, since it gets run quite eagerly (e.g. whenever you save a file, or when the file is idle). To that end, we wanted the base to be in C++ so that diagnostic feedback could be returned as quickly as possible. lintr is primarily R code and makes use of the codetools package for static analysis, which can be slower.

  3. Since the RStudio diagnostics engine uses its own ad-hoc parser, it tends to fail in a lot more situations than lintr might (especially related to code that uses non-standard evaluation, formulas, or other hard-to-analyze constructs).


#9

For the knit button / rmarkdown::render() example, I’m curious what you think would be more transparent behavior? The underlying code gets run in the console, and the documentation is, to my eyes, pretty clear about what’s going on. E.g., the article on IDE integration from the RMarkdown site says:

The “Run Document” button is a shortcut for the rmarkdown::render command. It let’s you quickly render your .Rmd file into an interactive document hosted locally on your computer. The RStudio IDE will diplay your document in a preview window.
[…]
If your .Rmd file does not contain runtime: shiny, the RStudio IDE will display a “Knit HTML” button in place of the “Run Document” button. The “Knit HTML” button works in the same way. It renders your .Rmd file and launches a preview of your output document.

Understanding that RStudio can’t make people notice the console, and they can’t make people read the documentation, what more are you hoping to see here?

I want to clearly note that I think this is a good discussion to have. I am very sympathetic to the frustration of watching people conflate R and RStudio, and I totally get the concern that people might feel unnecessarily dependent on an IDE or erroneously give credit to RStudio for features that are part of base R or a specific package.

The thing is, I think some of these problems will happen no matter what RStudio (the company or the IDE) does. My experience has been similar to what @jennybryan said, that people start making these distinctions when it becomes important to them — and that can be a long time after it seemed to a bystander like something they ought to know. Though finding myself in a workshop where the leaders didn’t know how to do what they were teaching outside RStudio would definitely try my patience — that sounds really frustrating!


#10

Thank you @jimhester and @kevinushey for these info. It is very interesting to learn a tiny bit more about how things work.


#11

It was not frustrating. It made it very challenging to follow any workshop without installing RStudio. But mostly, it was surprising to me.

I find living in linux in a Windows and MacOS world very easy, but living in ESS in an RStudio world challenging.

The documentation is indeed very clear and excellent. And yet, everybody around me (grad students and workshop organizers) who run complex glmm modelling in R, with one or two exceptions, do not know that (the render example). And the very few who do are the ones who use git, the shell, and other computing tools, and are interested in the code, not just the results (e.g. graphs or analyses). I am an absolute fan of Hadley and all of his work. And an absolute fan of the tidyverse. Because he is such a superstar and followed/read by so many, maybe he has a particularly large responsibility in how ideas, messages, etc. are presented. It probably is unfair on me to put that responsibility on him (and others like him) and I am not blaming anything he ever wrote or said. But he has a huge impact on a huge chunk of the R community. And maybe particularly so on the learning, younger chunk of it, who is increasingly becoming the new R community. So being more clear or specific about these things (as @mara did in the links she posted) could really help. Maybe @jennybryan is right that it may not matter if people are a bit confused on these things and they will learn it if/when it becomes useful to them. But it is always better to understand things better and to reduce confusion and it would help people who are playing outside of RStudio, making the R community, who is so amazing, more inclusive. Striving for clarity, even when not necessary to get the work done, is probably a good thing.

A post or blog about RStudio features, depending on how it is phrased, can trickle down perceptions to users, workshop organizers, new learners, etc. My lintr example was a terrible one since I was myself ignorant and confused on that particular point. But I have seen/heard examples where the distinction between R/packages and RStudio was blurred or could have been more clear (as in the excellent documentation). Because truth is, people read blogs much more often than they read the documentation.


#12

Thank you very much everyone for taking the time to share your thoughts with me. I truly appreciate. I can't emphasize enough how much I appreciate the work of the RStudio team and everyone involved in the tidyverse. I truly live in the tidyverse. So the concerns I expressed in this thread felt like dissonances with everything else. My motivation to post was really because of that (perceived) dissonance: something didn't feel quite right from the very group of people I look up to as guides in my little R journey and I wanted to poke at it.

I am really glad I started this thread because it allowed me, thanks to your lengthy and thoughtful replies, to realize that I was wrong in some cases (e.g. my lintr example) and biased in others (by noticing instances when distinctions between R and RStudio could have been more clear, while not being aware of instances where they were).

Getting replies from many people from the team (and this quickly) is more than I could have ever hoped for and being able to discuss my concerns put them at ease. As an added bonus, this allowed me to realize how active, open, and helpful this platform in the R community is (I just joined the site). The R community in general is remarkable...

I hope my perceptions (however naive, biased, and uninformed) were the source of thoughts around inclusivity in the R community, with or without the use of RStudio, and not simply annoying (or worse upsetting!), whiny, and unfair complaints. I meant my messages absolutely unaggressive and as little critical as possible, but I found it hard to express my ideas with the right tone (the French are so rude! :wink: ).

Oh, and before closing, I'd like to mention that I have nothing against RStudio as an IDE. My only reason for not using it is because it is so hard to take an emacs user out of emacs :wink:

Thank you everyone for all the awesome tools, work, help, and contributions to the R community :heart:


#13

Just as a heads-up and maybe a way to increase awareness, RStudio does have a blog where all of those things are announced as they become available - https://blog.rstudio.com/ .


#14

That thread, which showed several confusions (e.g. about pandoc, rmarkdown, or View()), is in line with my experiences. These confusions from so many are not all that obvious on a daily basis because people live in RStudio and relevant topics are thus not raised. But the minute an RStudio user pokes out of RStudio, these confusions become really evident. After reading all your great replies earlier, I am not sure which of these two views to believe anymore:

  • RStudio's task is to make RStudio as great as possible (and it sure does that well! :slightly_smiling_face: ) and it is not its role to teach people the underlying functioning of the IDE

or

  • The RStudio team is phenomenal in putting so much excellent resources out there (open source! :slightly_smiling_face: ) and it thus seems to value the increased understanding of the R community about R. Since there seems to be a lot of confusion on this specific topic, it would be really invaluable to try hard to clarify these whenever possible

I still tend to lean towards the 2nd, but the RStudio team does so much insanely amazing work that maybe I should be forgiving on this little point and accept the first :slightly_smiling_face:


#15

I would say that the biggest constraint here is time! One of the reasons that StackOverflow, and this community are so valuable to open source is because it "crowdsources" some of the Q&A which (like with so many code-related queries) usually involve some digging around, running code, etc., which, when one has the time is fun to do. But, in a lot of scenarios the fastest answers/suggestions just come from experience.

Again, I don't think anyone is trying to be misleading. AFAICT the independent capabilities of R came up almost immediately:


and I'd chalk @RobertMyles' statement re. View() to benign misunderstanding (again, it's someone trying to help based on their experience — the benefit of the crowd is that you read it, and know that not to be the case — and now everyone does!)

Last week Jim Hester mentioned something to the effect of there being 1207 functions in base! That's a lot of functions— and thank goodness for them, as they make everything else go round. That said, since we (people writing R code) don't typically use dplyr::filter() or stats::filter()-style notation, it's easy to get mixed up based on the context in which you used/learned something.

I'm pro disambiguation, I just think it's a big ask! Also, there's a question of audience— I think the pandoc question is a good example of a case where it is appropriate (and desirable) to differentiate what's going on, but that's not true of every scenario.


#16

When I posed the How do I configure R so programs developed in RStudio run successfully in R? question, I was viewing RStudio as a great interface for my data analysis group to use to develop scripts that could then be put into production (i.e., run with R in batch mode by folks across the whole organization). I did not expect RStudio to have pre-installed some system requirements for some packages that are not in the R installation. That makes it harder to move from dev in RStudio to prod in R.
If the RStudio folk could document the system requirements for some packages that RStudio pre-installs, I could put them on my machines that run R, and resolve the dev-prod difference. Or maybe the RStudio folks could provide instructions for configuring R so it can see the RStudio resources, and I could just install RStudio on each machine where I run R, and follow those instructions to resolve the dev-prod difference.
I recognize that I'm just one kind of RStudio user - treating it as a dev tool on my way to a production system, so my needs differ from the folks who do it all in RStudio.


#17

If it helps, here's the list of applications that make up my statistical computing environment. I also tend to develop in R Studio, but implement things via batch files that work outside of R Studio.

Name Purpose License Minimum Version
R Literate Statistical Programming GNU GPL 3.2.1
R Studio IDE AGPL 0.99.467
Rtools R package development GNU GPL 33
JAGS Bayesian statistical analysis GNU GPL, MIT 4.0
MiKTeX Document rendering Freely distributable 2.9
Pandoc Document rendering GNU GPL 1.15
Strawberry Perl Enhances text searching in R GNU GPL 5.22
FFMPEG Animated plotting LGPL 2015-09-28
PuTTY SSH Client for Distributed Computing MIT 0.66
Bitvise SSH Server SSH Server for distributed computing EULA 6.45

I operate on Windows operating systems.

I have a document somewhere that I had approved for external distribution. It goes through all of the software in my statistical computing environment and some installation instructions specific to Windows. If I can find it, I'll post it here. I'd strongly recommend developing such a document so that you can find your own internal reference with all the tools you need to get things done.


How do I configure R so programs developed in RStudio run successfully in R?
#18

Turns out, both @RobertMyles and I were correct: we were just talking about very similar, but different Views having the same name and doing the same thing :laughing:. You can read the follow up on that conversation on the initial thread :slight_smile: As with my lintr mistake, it seems that RStudio sometimes develops its own version of things :slight_smile:


#19

(Which shows that I also need education and have my own misconceptions corrected :smile: )