End-to-end Data Science with RStudio Connect (Meetup Discussion)

On Feb 23rd, we hosted a virtual meetup for teams using RStudio Connect (...or those that may be interested in using it in the future!) I've created this thread to recap questions asked during the session but also as a place to post any follow-up questions that we didn't have a chance to cover. Thank you all for joining!

Link to recording: https://zoom.us/rec/share/YSe5Ca_LrTUIuuumuOKG44FXo9O3WqxGnY-ddWQThs_76otqGA8318dZUXFW4UXL.c7bbbeZ1wpH8DZXj

Passcode: Uk2%.&n7

End-to-end Data Science with RStudio Connect
Presented by Alex Gold

Organizations often find that bringing a data science pipeline into production can be painful, but it doesn’t have to be. Drawing on the experiences of numerous RStudio Pro customers, you’ll see how RStudio Connect can be a platform for your whole data science workflow, from ETL, to model building, and model deployment and management.

Alex is a Solutions Engineer at RStudio, where he helps organizations succeed using Open Data Science and RStudio Pro products. Before coming to RStudio, Alex was a data scientist and data science team lead and worked on economic policy research, political campaigns, and federal consulting.

Live Q & A from the meetup:
(please add any other follow-up questions below!)

Are there any limitations on data size for pinning?

An important factor in determining whether or not to use a pin is the size of the data or object in use. Pins, in their simplest form, are objects transmitted through web requests. One way to determine if an object is too big to be a pin is by considering how long a web request to retrieve the pin will take. For example, if an object is 100 MB and the internet connection speed is 25 MB/s, the fastest the pin_get code could return would be 4 seconds. In many cases, it may take longer to return this data. Creating the pin will also take substantially longer because most networks have a slower upload speed than download speed. You can see an object’s size in R using code like:

format(object.size(data_frame), units = "MB")

As a general rule of thumb, you should not use pins for objects that are larger than 500 MB.

Alternatives to Pins for large objects: https://pins.rstudio.com/articles/advanced-too-big.html

Do you have a suggestion of the best way to organize and document which Pins exist?

For organization, You could use tags, which is what many teams do today, but we’re also working on additional solutions in Connect.

For more info, see “Managing Your Content in RStudio Connect”: https://support.rstudio.com/hc/en-us/articles/228272368-Managing-your-content-in-RStudio-Connect

You can also use the pins::pin_find command to find all the pins on a board by keyword and the pins::pin_versions command to find all the versions of a particular pin.

Is the Pin app/process-specific or could a single table be accessible to multiple apps?

Multiple different pieces of content sharing the same data on RStudio Connect is one of the best use cases for pins. The first half of this talk from RStudio::conf(2020) has more on some of the ideal use cases for pins.

Does RStudio Connect broker the API connection and pull the data?

Yes. As of version 1.8.6 of RStudio Connect, you must provide an RStudio Connect API key to any content deployed on RStudio Connect (e.g. a scheduled R Markdown or Shiny app) that will be creating or accessing a pin on RStudio Connect.

The code would look like:

board_register_rsconnect(key = "the-rstudio-connect-api-key",

server = "https://rstudio-connect-server")

pins::pin_get(“my_pin”, board = “rsconnect”)

We recommend keeping the API key in an environment variable to avoid it being exposed in code.

More details on how to provide an API key here.

Is pinned data stored in memory or on the file system?

A pin is like any other piece of content deployed on RStudio Connect and is stored on the filesystem. Once loaded into an R or Python session, the data is stored in memory in that session.

1 Like