R & RStudio - The Interoperability Environment for Data Analytics
Curtis Kephart and Lou Bajuk - 2020-08-17
On the RStudio Developer Blog we’ve recently written a series on interoperability and R, including why enterprises should embrace workflows that are open to diverse toolsets.
The designers of R, from its very beginnings, have dealt directly with how best to tap into other tools. Statisticians, analysts, and data scientists have long been challenged to bring together all the statistical methods and technologies required to perform the analysis the situation calls for—and this challenge has grown as more tools, libraries, and frameworks become available.
John Chambers writing on the design philosophy behind the S programming language, the predecessor to R,
“[W]e wanted to be able to begin in an interactive environment, where they did not consciously think of themselves as programming. Then as their needs become clearer and their sophistication increased, they should be able to slide gradually into programming, when the language and system aspects would become more important.”
Part of this design philosophy is to minimize the amount of effort and overhead required to get your analytics work done. It is not fair to assume that every data scientist is programming all day, or coming from a computer science background, but they still need to implement some of the most sophisticated tools programmers use.
The ecosystem around R has striven to strike the right balance between a domain specific environment optimized for data science workflows and output, and a general programming environment. For example CRAN, Bioconductor, rOpenSci, and GitHub provide collections of packages written with data science in mind, which extend core R’s functionality, letting you tap into (and share) statistical methods and field-specific tools — when and only when you need them.
Many of the most popular packages offer interfaces to tools in other languages. For example, most tidyverse packages include compiled (C/C++) code. Interestingly, core R itself connects you to tooling mostly written in other programming languages. As of R 4.0.2 over 75% of the lines in core R’s codebase are written C or Fortran (C 43%, Fortran 33%, & R 23.9%).
RStudio - design philosophy and development priorities
Our mission at RStudio is to create free and open source software for data science, scientific research, and technical communication. R is a wonderful environment for data analysis, and we’ve focused on making it easier to use. We do this through our IDE and open sources packages, such as the tidyverse. We also do this by making data science easier to learn through RStudio Cloud and our support for data science education. And we help make R easier to manage and scale out across an organization through our our professional products, supporting best practices for data science in the enterprise through our solutions team.
As part of this effort, we have focused heavily on enabling and supporting interoperability between R and other tools. We recently outlined in a recent blog post how the RStudio IDE allows you to embed many different languages in RMarkdown documents, including:
Using R & Python together through the
- SQL code for accessing databases,
- BASH code for shell scripts,
C and C++ code using the
STAN code with
rstanfor Bayesian modeling,
- and many more languages . You can find a complete list of the many platforms supported in the language engines chapter of the book, R Markdown: The Definitive Guide.
And we work with the community to support:
- Bilingual data science teams, by providing a single platform for data scientists to develop in R or Python (RStudio Server Pro), and to deploy applications built with either (through RStudio Connect)
- Making it easy to create web applications with shiny or put models into production via plumber APIs
- Supporting easy access to data sources, such
dbplyrfor database access and wrangling.
- Incubating Ursa Labs, which is focused on building the next generation of cross language tools, leveraging the Apache Arrow project.
- Integration from R with other modeling frameworks, including TensorFlow and SparkMLlib
- Using Sparklyr and Launcher with kubernetes to distribute your calculations or modeling operations over many machines, which we will be discussing in more depth in an upcoming blog post .
This list goes on and on and grows by the week.
R with RStudio is a wonderful environment for anyone who seeks understanding through the analysis of data. It does this by finding a balance between a domain specific environment and a general programming language that doesn’t prioritize data scientists. That is, it strives to be an environment optimized for analytics workflows and output. At the fulcrum of this balance is extensive interoperability, the ability to pull in interfaces into other technologies as they are needed, and a vibrant community sustaining these. This has been the goal for R since initial design principles, through the extensive work shared by the R community, and significant continued investment by RStudio.