Data wrangling efficiency - spatial objects & subsetting before vs. inside app

Hi there,

I am creating an app and have 2 questions pertaining to Shiny efficiency: data storage in terms of object types and data wrangling inside or outside the app.

About the data:

Dataset containing: 12 variables across 50 states spanning 11 years at the county-level. I have an sf object of counties, an sf object with the aforementioned attributes included, and an aspatial dataset containing FIPS codes for linking to sf object (normal data table without any geometry).

About the app:

I am creating a Shiny app that examines various variables across the US and the user can filter by state or view the entire US. The user can select 2 variables at a time to be depicted in a bivariate scatter as well as 2 univariate plots, then the user can select a variable to be visualized in a leaflet map as well. Here is an example of an app very similar to what I am trying to do.

Efficiency questions

  1. Given that I have aspatial data visualizations (the scatter plots) as well as spatial (leaflet), what would be the most efficient way to store and load the data? Having a data table of the data as well as an sf object with all needed attributes? Or just a data table and an sf object of counties, then link by FIPS code inside the app?

  2. Based on the users' selection, the data will be subset to a specific state. Would it be more efficient to create a dataset for each state in advance to eliminate subsetting in the actual Shiny app (thus this would be 51 datasets, each state + all states)? The answer to this may depend on the answer prior.

Thank you in advance for any insights you are able to share.

From Joe Cheng's rstudio::2019 keynote...

  1. Optimize
    1. Move work outside of Shiny (very often)
    2. Make code faster (very often)
    3. Use caching (sometimes)
    4. Use async (occasionally)

Efficiency:

  • If you have constant data wrangling procedures, they should be moved outside of the shiny application. (Perform once, not n times.) . The final result can be saved to an RDS file that can be loaded outside of your server function. This means it'll be read at application start, not at every user session.

With everything being directed under a FIPS code, I would make a list object with the keys as the FIPS code and the values as a list of the subsetted sf data and subsetted data table. This whole object can be stored in a single .rds (?readRDS) file that can be read outside the server function.

Given a users selection for the FIPS code, pull the value for your list object.

Example ideas:

# some_script.R
all_data <- list()
all_data[["FIPS1"]] <- list(
  spatial = subsetted_spatial,
  dt = subset(aspatial, FIPS == "FIPS1")
)
#...
saveRDS(all_data, "all_data.rds")
# app.R
library(shiny)
#...
all_data <- readRDS("all_data.rds")
server <- function(input, output) {
  # use all_data...
}

Thank you, @barret! Joe Cheng's keynote and your tips have been incredibly helpful.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.