Up to now I have only played with shiny locally with small datasets that were loaded from csv directly into ram. I would like now to develop a dashboard for use within my company that displays a year of daily time-series data from the current day from a remote data source (bigQuery). This data lives in a (virtualized) view on bigquery; every time the view is queried one-year worth of data (365 days) from the current day is returned.
I guess I’m a bit (very) confused as it’s the first time I do this, but I have some basic questions about best practices I couldn’t find a straightforward answer to:
is it a good idea to write the shiny app to query the needed data from the database, via e.g. dplyr, at startup? I suspect this would slow down the startup of the app.
Additionally, if users A, B, C, … were running sessions on the app at the same time, they would each send queries to the remote source for the data to be loaded into memory; wouldn’t this lead to many copies of the same data to be loaded into memory, thus exhausting it? (this would be particularly painful for bigQuery as you are billed per GB queried) (i hope you understand what I mean…)
is it better practice to set up a e.g. cron job to pull the needed data on the machine hosting the shiny server (perhaps as .RData, or as monetDBlite?) and let the shiny app to load the data locally? I imagine this would speed up the startup time. Users would still duplicate the data in ram though (is there a way to load a dataset in ram and let every app that is spin up access that dataset?) and for a production environment running cron jobs like this seems a bit spooky.
Sorry for the confused question but… well, I’m confused